From SGaudet at turbotekcomputer.com Thu Nov 1 05:55:17 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:50 2009 Subject: AMD testing Message-ID: <3450CC8673CFD411A24700105A618BD6170EE5@911TURBO> Hello Don, > I had a problem with a Tyan 2460 that I got when they first > came out. It > seems that the board actually only supports 3 memory modules > even though it > has 4 slots. (??!!) I had the board in a cluster with several > other Tyan 2462 > board machines all with 4 256MB mem modules. We were doing > testing on the > system and the machine with the 2460 was giving garbage results for a > calculation that used about 700MB of memory. All smaller jobs > had tested OK. > I couldn't find the problem (went through different memory, > kernels etc..) > then I started thinking ... why does Tyan say the board only > supports 3GB of > memory ... sure enough, when I took 1 module out of the 2460 > machine, it ran > the big test job correctly. I tested this on a newer order of > the mother > boards and they seemed to be OK. The markings on the > motherboard still say > "A" but it looks a little different, the old one had dots > around it. (?) I > don't know if this is an isolated problem, just a bad board > ... ??? However, > I have seen other complaints about memory problems with the > 2460. Also, I > discovered that the sockets are "REALLY" fussy about how you > insert the > modules. If you don't get them just right memtest86 will > generate errors on > the modules even though they test good on other boards. I > assume you had 4 > 512MB modules in your machine I suggest you try leaving DIMM4 > empty and try > testing the system again. > > I let Tyan know about the problem but haven't received a response. I've seen this problem before with Tyan motherboards. A year ago we had the same issue with the S2510NG ThunderLE, dual NIC, 4 MB ATI Graphics, No SCSI. The problem was hardware related, you couldn't install anything in the fourth memory slot and expect to see it. My guess is its another hardware related problem. I'd try smaller density ram first and see if the fourth slot is working at all. Cheers, Steve Gaudet ..... <(???)> ========== Turbotek Computer Corp. 8025 South Willow St. Manchester, NH 03103 toll free:800-573-5393 tel:603-666-3062 ext. 21 fax:603-666-4519 e-mail:sgaudet@turbotekcomputer.com web: http://www.turbotekcomputer.com From lindahl at conservativecomputer.com Thu Nov 1 07:45:56 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:50 2009 Subject: AMD testing In-Reply-To: <0GM3004KF97Y0W@mta5.rcsntx.swbell.net>; from kinghorn@pqs-chem.com on Wed, Oct 31, 2001 at 03:06:19PM -0600 References: <0GM3004KF97Y0W@mta5.rcsntx.swbell.net> Message-ID: <20011101104556.C10893@wumpus.foo> On Wed, Oct 31, 2001 at 03:06:19PM -0600, Donald B. Kinghorn wrote: > I had a problem with a Tyan 2460 that I got when they first came out. It > seems that the board actually only supports 3 memory modules even though it > has 4 slots. (??!!) Capacitance problem. I had some Alpha LX boards that were like that: 256 MB DIMMs weren't on the compatibility list, not because they didn't work at all, but because you couldn't fill all the slots with them. Well, I've had machines running reliably for years that had a mix of 256 and 128 MB DIMMs. BTW, you are sure you're using registered memory instead of unbuffered? One of the features of cheaper, unbuffered memory is that you can't use that many of them. This probably isn't the case because I don't think 3 or even 2 of those would necessarily work... greg From agrajag at scyld.com Thu Nov 1 07:29:40 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:50 2009 Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster In-Reply-To: <3BE062CB.C474552D@aviion.univ-lemans.fr>; from fcalvay@aviion.univ-lemans.fr on Wed, Oct 31, 2001 at 09:44:59PM +0100 References: <3BE062CB.C474552D@aviion.univ-lemans.fr> Message-ID: <20011101102940.A25971@blueraja.scyld.com> On Wed, 31 Oct 2001, Florent Calvayrac wrote: > > to those with the same problem > > Since I couldn't find any free programs to address > easily this issue I include below a dirty bash2 script > to spawn mpi executables on the "free" nodes of a Scyld cluster > > Comments and feedback welcome This script is just spawning jobs on the nodes that are using less cpu time, right? If you are using our latest release, -8, mpich automatically uses beomap to map which nodes the jobs go to, and beomap's default behavior is to automatically map the jobs to the nodes that have the lowest cpu usage. Is there something to this script that I'm missing that mpich with beomap doesn't do for you? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011101/8cf81c8b/attachment.bin From raysonlogin at yahoo.com Thu Nov 1 08:18:55 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:50 2009 Subject: [PBS-USERS] PVFS In-Reply-To: <002301c162ed$b2055670$990c2a80@batman> Message-ID: <20011101161855.33405.qmail@web11407.mail.yahoo.com> I think this discussion belongs to the beowulf mail-list. Anyway, back to your question, please read the sample chapter: http://www.oreilly.com/catalog/clusterlinux/chapter/ch09.html Rayson --- Brent Clements wrote: > This is waaaay off the subject here....but anyone using PVFS? And why > would I want to use it in my linux cluster? > > -Brent Clements > > __________________________________________________ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com From kinghorn at pqs-chem.com Thu Nov 1 09:45:19 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:50 2009 Subject: AMD testing Message-ID: <0GM40000YUL0GY@mta5.rcsntx.swbell.net> ... the memory modules I'm using are Crucial Registered ECC PC2100 ... should be good ... and Tyan lists using 4 of these modules as a "tested" configuration. I'm disappointed that I didn't get a responce from Tyan since this is a serious issue. I would much rather have a system fail outright rather than just producing erroneous results for some problems. I'll be glad to see some other vendors enter the dual athlon market. I should note again that the newer 2460 boards I received seem to be OK. However, I would urge anyone using these boards to do thourough testing. -Don From gabriel.weinstock at dnamerican.com Thu Nov 1 13:09:16 2001 From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock) Date: Wed Nov 25 01:01:50 2009 Subject: Linksys EG1064 Message-ID: <01110116091602.01763@patagonia.dnamerican.com> Is the Linksys GigE EG1064 NIC well supported under Linux? I scoured the web for information, and was able to find that it has a driver for FreeBSD, but I didn't find it in the hardware compatibility lists on RedHat or SuSE's sites. I would just like to know before purchasing. Thanks in advance, -gabriel From hahn at physics.mcmaster.ca Thu Nov 1 13:14:14 2001 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:50 2009 Subject: AMD testing In-Reply-To: <20011101104556.C10893@wumpus.foo> Message-ID: > > I had a problem with a Tyan 2460 that I got when they first came out. It > > seems that the board actually only supports 3 memory modules even though it > > has 4 slots. (??!!) > > Capacitance problem. I had some Alpha LX boards that were like that: perhaps. people often don't realize that a dimm can be single or double-sided, though: double sided uses up two "banks" that the chipset supports. so, for instance, 4 slots might well be reasonable for a chipset that supports 6 banks, since you might be able to put 4 single-sided dimms in the slots and have them work. (6-banks is fairly common among chipset I've looked at recently. note that these 'banks' are different from the banks internal to a single chip...) > 256 MB DIMMs weren't on the compatibility list, not because they a 256M dimm that consisted of 8 256Mb parts would count as one-sided, for instance, but one that had 16 128 Mb parts would be two-sided. offhand, I'd guess that all reg/buf dimms count as one-sided, but I suppose two-sided might exist, too, and be 1 gate cheaper... regards, mark hahn. From zadok at phreaker.net Thu Nov 1 14:48:45 2001 From: zadok at phreaker.net (Hereward Cooper) Date: Wed Nov 25 01:01:50 2009 Subject: AMD testing (fwd) In-Reply-To: References: Message-ID: <20011101224845.20ec5cbe.zadok@phreaker.net> > > I had a problem with a Tyan 2460 that I got when they first > > came out. It > > seems that the board actually only supports 3 memory modules > > even though it > > has 4 slots. (??!!) I had the board in a cluster with several > > other Tyan 2462 > > board machines all with 4 256MB mem modules. We were doing > > testing on the > > system and the machine with the 2460 was giving garbage results for a > > calculation that used about 700MB of memory. All smaller jobs > > had tested OK. > > I couldn't find the problem (went through different memory, > > kernels etc..) > > then I started thinking ... why does Tyan say the board only > > supports 3GB of > > memory ... sure enough, when I took 1 module out of the 2460 > > machine, it ran > > the big test job correctly. I tested this on a newer order of > > the mother > > boards and they seemed to be OK. The markings on the > > motherboard still say > > "A" but it looks a little different, the old one had dots > > around it. (?) I > > don't know if this is an isolated problem, just a bad board > > ... ??? However, > > I have seen other complaints about memory problems with the > > 2460. Also, I > > discovered that the sockets are "REALLY" fussy about how you > > insert the > > modules. If you don't get them just right memtest86 will > > generate errors on > > the modules even though they test good on other boards. I > > assume you had 4 > > 512MB modules in your machine I suggest you try leaving DIMM4 > > empty and try > > testing the system again. > > > > I let Tyan know about the problem but haven't received a response. The manual that comes with the board does have a table showing the possible combinations of memory, but it does state it doesn't list them all. Buts it still gives you an idea. Thanks, Hereward From Florent.Calvayrac at univ-lemans.fr Fri Nov 2 01:24:15 2001 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Wed Nov 25 01:01:50 2009 Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster References: <3BE062CB.C474552D@aviion.univ-lemans.fr> <20011101102940.A25971@blueraja.scyld.com> Message-ID: <3BE2663F.2394621F@univ-lemans.fr> S > > This script is just spawning jobs on the nodes that are using less cpu > time, right? yes... but you admit that until this release, the problem was present and discussed here about two months ago. If you are using our latest release, -8, mpich > automatically uses beomap to map which nodes the jobs go to, and > beomap's default behavior is to automatically map the jobs to the nodes > that have the lowest cpu usage. I am glad to learn it, this had escaped my attention. We are indeed using -7 release, since it takes some time to come from linuxcentral to here on CD... and since you are a commercial company and only release on the FTP site (again from what I know) a large bunch of source packages : I estimated that the whole compilation and installation time was too high and decided to keep with -7 until the -8 is available on LinuxCentral. -- Florent Calvayrac Laboratoire de Physique de l'Etat Condense http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences 72085 Le Mans Cedex 9 From rajkumar at csse.monash.edu.au Fri Nov 2 01:38:21 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:50 2009 Subject: Info on: comp.distributed Message-ID: <3BE2698D.FED3B8E9@csse.monash.edu.au> Dear All, FYI, Discussions are currently underway for the creation of an unmoderated Usenet newsgroup called comp.distributed to address grid and peer-to-peer (even cluster) issues, or any other issues related to collectives of network connected distributed resources. Discussions are taking place on the news.groups newsgroup, and voting is expected to begin in mid-November. The current draft of the Request for Discussion (RFD), which includes a description of the charter, can be found on news.announce.newgroups or attached below. Thanks Raj ------------------------------------------------------------------- REQUEST FOR DISCUSSION (RFD): unmoderated newsgroup comp.distributed This is a formal Request For Discussion (RFD) for the creation of world-wide unmoderated Usenet newsgroup comp.distributed. This is not a Call for Votes (CFV); you cannot vote at this time. Procedural details are below. Newsgroup line: comp.distributed Distributed Resource Sharing and Exploitation. CHANGES from previous RFD: This is an updated version of the previously submitted RFD for comp.p2p-grid. It addresses many comments and concerns raised during discussion on the earlier RFD including the recommendation of a new name, comp.distributed. RATIONALE: comp.distributed Networks in general, and the internet specifically, have been evolving, from star topologies of thin clients or dumb terminals connected to central servers, to a collection of highly connected nodes, many having significant compute, storage, and peripherals, along with human presence. Likewise, internet tools and protocols have evolved from being primarily a mechanism to "push" (via email) or "pull" (via web-browser) untyped data, into supporting more interactive, semantic, and bi-directional relationships. These changes have prompted different communities to (re-)explore the potential of sharing and exploiting collections of heterogeneous, geographically distributed resources such as computers, data, people, and scientific instruments in a secure and consistent manner, usually lacking any central control or authority. These efforts are often described with terms like "peer-to-peer" ("p2p") and "grids", and can serve to virtualize enterprizes by blurring the significance of physical location. Different communities tend to focus on different varieties of resources, different overall objectives and constraints, and different degrees of permanence of the resource collectives. For example, "grid" communities will often consider large, semi-permanent (though dynamically constituted) collections of world-class resources that can be accessed much as utilities, to provide unprecedented capabilities that enable, for example, large-scale problems in science, engineer- ing, and commerce. "p2p" communities, on the other hand, often seek on-demand temporary relationships between everyday personal computers, devices, and peripherals "at the edge of the network", that help to solve every-day problems of sharing, collaboration, and computing in more efficient, convenient, and economical ways. Similar relation- ships have been explored over time in areas related to human collabor- ation, distributed data bases, distributed search, parallel and distributed computing, web services, and hierarchical content delivery networks. In spite of these differences, all of these communities share a large number of challenges as a direct result of attempting to effectively and synergistically assemble and use these collectives of hetero- geneous distributed resources. These challenges include: * Lack of any central authority, leading to the potential unannounced availability or withdrawal of resources, requiring fault tolerant applications and complicating the discovery and scheduling of resources. * Heterogeneous resources, requiring methods to recognize and request unique functionality when needed, while hiding unexploitable resource differences behind consistent interfaces. * Heterogeneous performance in those resources, prompting the use of simulation and performance modeling to determine which resources to use when. * Heterogeneous requirements from both resource owners and end users in terms of their objectives, quality of services, and computa- tional economy. * Unpredictable and dynamic network topology and properties, requiring the ability to portably deal with differing latency and bandwidth constraints (e.g. hiding latency while minimizing overhead) and motivating quality of service (QoS) mechanisms. * A complex and unpredictable concurrent environment, requiring general approaches to program development that hide these features while leveraging existing tools, languages, and techniques wherever possible. * A memory hierarchy that can extend to the memory and disk throughout the collective, prompting a reconsideration of traditional data storage and caching approaches. * The potential presence of untrusted resources and/or actors, requiring decentralized approaches to privacy, authorization, authentication, anonymity, and the determination of levels of acceptable risk associated with different operational modes. * Achieving return on investment for both resource users and providers, requiring approaches for auditable accounting and re- imbursement as well as the consideration of cost/price as a resource selection parameter. * Impediments to connectivity, including firewalls and oversubscribed scarce network resources (such as dial-in modems, and IP addresses shared through network address translation/IP masquerading). * Cross-organizational IT involvement, requiring flexible and politically acceptable policies, procedures, and management tools. * Evaluating and proposing mechanisms and policies for the protection of intellectual property in an environment explicitly designed to facilitate instant sharing. * Understanding and exploiting the potential value of these resource collectives, including effective collaboration strategies, integration of mixed resource types into problem solving environments, novel application areas and solution approaches enabled by this environment, and the use of automated agents. Already, international academic and commercial forums like: * Global Grid Forum: * Peer to Peer Computing WG: * Universal Plug-n-Play Forum * New Productivity Initiative have evolved to create standards and protocols for inter-operability between heterogeneous systems providing virtual services. Recently, infrastructure projects like the NSF Distributed TeraScale Facility have focused even more attention, and include involvement from several companies. Many computer and/or software vendors, large and small, have recently announced specific projects or general priorities into p2p and/or grids, including IBM, Intel, DSTC, Sun, and Microsoft. Some details on these and other projects can be found at: * http://www.gridcomputing.com/ * http://www.computer.org/dsonline/gc/index.htm * http://www.peertal.com/ * http://www.nwfusion.com/ * http://www.peerintelligence.com/ * http://www.openp2p.com/ Although over 20 discussion mailing lists operated by individuals or institutions exist, they are generally intended for discussion of specific group priorities, and strongly segregate p2p and grid communities, even when addressing similar issues. Another concern is that mailing lists are likely to generate large volume of email for members; therefore, moderators will often discourage use of these lists for general or controversial discussion, and many prospective participants feel discouraged from subscribing, do not become members, and do not join important topical discussions. We believe that having a newsgroup where people can participate in discussions of their own choosing, when they want, without getting swamped with emails, will help overcome these limitations and will encourage discussion and dissemination without the need of explicit membership. While some existing newsgroups, like comp.parallel and comp.sys.super, touch on some specialized aspects of this topic, and will continue to do so, this new group will serve as a focal point for considering the inter- relationships, interactions, and synergies when combining these separate technologies. Strategy for publicising the comp.distributed newsgroup: The formation of the comp.distributed newsgroup will be publicised through the following channels (but not limited to): * IEEE DS Online, * Global Grid Forum, * P2P WG, * Grid Infoware, * IEEE/ACM conferences: * CCGRID'xy: , * GRID'xy: , * Yahoo Group on gridcomputing as part of GridInfoware. * IEEE Task Force on Cluster Computing (TFCC) * Newsgroups such as comp.parallel END RATIONALE. CHARTER: comp.distributed Although the name "comp.distributed" has been chosen due to its familiarity and convenience, the group is to be broader than just those topics traditionally regarded as "distributed computing". Specifically, topics are to include any unique issues relating to the creation and exploitation of collectives of geographically distributed and potentially heterogeneous resources such as computers, data/information sources, peripherals, instruments, and humans. Appropriate areas of discussion in this context would include (but are not limited to): * discovering, scheduling/brokering, and accessing remote resources * exploitation of heterogeneous resources * resource management, scheduling, and computational economy * portable/adaptable communication substrates * quality of service approaches * portable program development tools, languages, techniques * data management tools and techniques * exploitation of distributed memory hierarchy * decentralized security * practical accounting, reimbursement, and business & revenue models * overcoming impediments to wide-area connectivity * cross-organizational policy issues and ways to address them * mechanisms and policies for intellectual property * programming tools, environments, and languages * applications, collaboration, and distributed agents * simulation and performance modelling * comparisons of grid and p2p, and issues unique to each * events, surveys, news and general announcements It is expected that additional 3rd-level subgroups addressing some of these topics or others may be created as dictated by the volume and cohesiveness of resulting message traffic. END CHARTER. PROCEDURE: This is a request for discussion, not a call for votes. In this phase of the process, any potential problems with the proposed newsgroups should be raised and resolved. The discussion period will continue for a minimum of 21 days (starting from when the first RFD for this proposal is posted to news.announce.newgroups), after which a Call For Votes (CFV) will be posted by a neutral vote taker. Please do not attempt to vote until this happens. All discussion of this proposal should be posted to news.groups. This RFD attempts to comply fully with the Usenet newsgroup creation guidelines outlined in "How to Create a New Usenet Newsgroup" and "How to Format and Submit a New Group Proposal". Please refer to these documents (available in news.announce.newgroups) if you have any questions about the process. END PROCEDURE. DISTRIBUTION: comp.distributed This RFD has been posted to the following newsgroups: news.announce.newgroups news.groups comp.arch comp.parallel comp.parallel.pvm comp.parallel.mpi comp.sys.super comp.client-server and to the following mailing lists: END DISTRIBUTION. Proponent: Rajkumar Buyya Proponent: David C. DiNucci From agrajag at scyld.com Fri Nov 2 05:07:30 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:50 2009 Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster In-Reply-To: <3BE2663F.2394621F@univ-lemans.fr>; from Florent.Calvayrac@univ-lemans.fr on Fri, Nov 02, 2001 at 10:24:15AM +0100 References: <3BE062CB.C474552D@aviion.univ-lemans.fr> <20011101102940.A25971@blueraja.scyld.com> <3BE2663F.2394621F@univ-lemans.fr> Message-ID: <20011102080730.A27515@blueraja.scyld.com> On Fri, 02 Nov 2001, Florent Calvayrac wrote: > S > > > > This script is just spawning jobs on the nodes that are using less cpu > > time, right? > > yes... but you admit that until this release, the problem > was present and discussed here about two months ago. Yes. We saw the problem discussed on the list, which is one of the reasons we made beomap, to solve the problem. I appreciate you sending out a fix for the problem, I just wanted to let you know that we already have our own solution that works without running an extra script. (In case you're curious, beomap actually pulls the cpu load info from libbeostat) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011102/99d408f5/attachment.bin From Florent.Calvayrac at univ-lemans.fr Fri Nov 2 08:49:11 2001 From: Florent.Calvayrac at univ-lemans.fr (Florent.Calvayrac) Date: Wed Nov 25 01:01:50 2009 Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster In-Reply-To: <20011102080730.A27515@blueraja.scyld.com> from "Sean Dilda" at Nov 02, 2001 08:07:30 AM Message-ID: <200111021649.RAA11656@pecbip1.univ-lemans.fr> I had given a look to the sources of mpprun, thinking of using indeed libbeosta, but could not figure a way not to have one process running on the master node. Is this problem fixed in -8 relase of scyld ? -- Florent Calvayrac | Tel : 02 43 83 26 26 Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18 UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 From derek.richardson at pgs.com Fri Nov 2 10:34:12 2001 From: derek.richardson at pgs.com (Derek Richardson) Date: Wed Nov 25 01:01:50 2009 Subject: Linksys EG1064 Message-ID: <1004726053.1914.128.camel@idoru.hstn.tensor.pgs.com> Gabriel, Can't speak for the Linksys card, but I've had good experiences w/ Intel gigabit ethernet. I'd love to give you the model #, but it's from IBM, so it's their model #'s. It runs off the e1000 kernel module, though, and handles our NFS load quite well. Regards, Derek R. -- Junior Linux Geek 713-817-1197 (cell) 713-781-4000 x2267 (office) "Linux users, fanatical. No way... HEY! Get that MCSE up on the altar, Tux must be appeased!" From SThomaso at phmining.com Fri Nov 2 11:25:22 2001 From: SThomaso at phmining.com (Scott Thomason) Date: Wed Nov 25 01:01:50 2009 Subject: Compile farm? Message-ID: Greetings. I'm interested in setting up a shell account/batch process/compile farm system for our developers, and I'm wondering if Beowulf clusters are well suited to that task. We're not interested in writing parallel code using PVM or MPI, we just want to log into what appears to be one big server and have it dispatch the workload amongst the slave processors. Is Beowulf good at that? ---scott p.s. Sorry if there are duplicates of this message; I used the wrong email address earlier. From ron_chen_123 at yahoo.com Fri Nov 2 11:44:54 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:50 2009 Subject: Compile farm? In-Reply-To: Message-ID: <20011102194454.28360.qmail@web14707.mail.yahoo.com> What you need is a batch system. There are 2 free batch systems, SGE and PBS. Both of them are opensource, but nevertheless, you can get 7x24 support if you are willing to pay. PBS: www.openpbs.com www.pbspro.com SGE: www.sun.com/gridware gridengine.sunsource.net Also, SGE has qmake, which can execute several instances of make on mutliple machines for one single make job. Install note: http://supportforum.sun.com/gridengine/appnote_install.html -Ron --- Scott Thomason wrote: > Greetings. I'm interested in setting up a shell > account/batch process/compile farm system for our > developers, and I'm wondering if Beowulf clusters > are well suited to that task. We're not interested > in writing parallel code using PVM or MPI, we just > want to log into what appears to be one big server > and have it dispatch the workload amongst the slave > processors. Is Beowulf good at that? > ---scott > > p.s. Sorry if there are duplicates of this message; > I used the wrong email address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From bremner at unb.ca Fri Nov 2 12:42:07 2001 From: bremner at unb.ca (David Bremner) Date: Wed Nov 25 01:01:50 2009 Subject: Compile farm? In-Reply-To: <20011102194454.28360.qmail@web14707.mail.yahoo.com> References: <20011102194454.28360.qmail@web14707.mail.yahoo.com> Message-ID: <15331.1311.605294.128939@convex.cs.unb.ca> Ron Chen writes: > What you need is a batch system. > > There are 2 free batch systems, SGE and PBS. > [good info snipped] It is not obvious that a batch system is the best answer to this particular problem. Mosix (www.mosix.org) may be more appropriate for providing a single system image. From raysonlogin at yahoo.com Fri Nov 2 13:03:58 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:50 2009 Subject: Compile farm? In-Reply-To: <15331.1311.605294.128939@convex.cs.unb.ca> Message-ID: <20011102210358.80694.qmail@web11401.mail.yahoo.com> IMO, either Mosix or a batch system can do the job. However, Mosix requires patching/recompiling the kernel. And the recent changes in the VM makes running a non-standard kernel troublesome. For the case of a batch system, the system admin only needs to install the package, no recompiling of the kernel. And the user can even submit jobs from their workstations. The jobs are queued until there are resources for them to run. Rayson > It is not obvious that a batch system is the best answer to this > particular problem. > > Mosix (www.mosix.org) may be more appropriate for providing a single > system image. > > From the point of view of efficient use of resources, a batch is > probably important. __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From zadok at phreaker.net Sat Nov 3 13:26:05 2001 From: zadok at phreaker.net (Hereward Cooper) Date: Wed Nov 25 01:01:50 2009 Subject: [ot] Re: AMD testing Message-ID: <20011103212605.20c9b83d.zadok@phreaker.net> Hi there, Has any user of the Tiger MP S2460 had experience of what happens if you DON'T use registered memory? Will it blow up :-) ?? Thanks, Hereward -- What, never seen a signautre file before? From xyzzy at speakeasy.org Sat Nov 3 17:11:52 2001 From: xyzzy at speakeasy.org (Trent Piepho) Date: Wed Nov 25 01:01:50 2009 Subject: [ot] Re: AMD testing In-Reply-To: <20011103212605.20c9b83d.zadok@phreaker.net> Message-ID: On Sat, 3 Nov 2001, Hereward Cooper wrote: > Hi there, > > Has any user of the Tiger MP S2460 had experience of what happens if you DON'T > use registered memory? Will it blow up :-) ?? There was a review of this board when it came out at tom's hardware or anadtech, I'm not sure which. They tested registered vs non-registered memory. If you use more than two DIMM slots, you need registered. Three non-registered DIMMs won't work, and two non-registered plus one registered won't work either. It doesn't explode or catch fire (only if the heatsink falls off..), but won't pass POST. Registered ECC is only a couple dollars more than non-registered ECC, so there really is no reason not to get it. From math at velocet.ca Sat Nov 3 18:10:31 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:50 2009 Subject: [ot] Re: AMD testing In-Reply-To: <20011103212605.20c9b83d.zadok@phreaker.net>; from zadok@phreaker.net on Sat, Nov 03, 2001 at 09:26:05PM +0000 References: <20011103212605.20c9b83d.zadok@phreaker.net> Message-ID: <20011103211031.D27471@velocet.ca> On Sat, Nov 03, 2001 at 09:26:05PM +0000, Hereward Cooper's all... > Hi there, > > Has any user of the Tiger MP S2460 had experience of what happens if you DON'T > use registered memory? Will it blow up :-) ?? No it just doesnt work. I got a couple systems booted, but when I typed 'cat' I got a 'signal 9' error which I've never seen before. Later, I booted again and my bash login shell wouldnt start - 'signal 11' error. A third time logging in login(1) died with a signal 11 as well. It just dont work. On www.crucial.com registered DDR ECC ram is only $4 or $5 more per stick. /kc > > Thanks, > > Hereward > > > > > > -- What, never seen a signautre file before? > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From math at velocet.ca Sat Nov 3 18:14:39 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:50 2009 Subject: Tyan Tiger MP (was Re: [ot] Re: AMD testing) In-Reply-To: ; from xyzzy@speakeasy.org on Sat, Nov 03, 2001 at 05:11:52PM -0800 References: <20011103212605.20c9b83d.zadok@phreaker.net> Message-ID: <20011103211439.E27471@velocet.ca> On Sat, Nov 03, 2001 at 05:11:52PM -0800, Trent Piepho's all... > On Sat, 3 Nov 2001, Hereward Cooper wrote: > > Hi there, > > > > Has any user of the Tiger MP S2460 had experience of what happens if you DON'T > > use registered memory? Will it blow up :-) ?? > > There was a review of this board when it came out at tom's hardware or > anadtech, I'm not sure which. They tested registered vs non-registered > memory. If you use more than two DIMM slots, you need registered. Three I had one DIMM in, 256Mb, running freebsd. > non-registered DIMMs won't work, and two non-registered plus one registered > won't work either. It doesn't explode or catch fire (only if the heatsink > falls off..), but won't pass POST. Registered ECC is only a couple dollars > more than non-registered ECC, so there really is no reason not to get it. It passed POST no problem, got through all the rc files, but then started dying as soon as I logged in. Your mileage may vary. BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on the +5V line. The 350W PSs I got do 32A on +5 and work great (and seem to be of higher quality altogether too). Watch out with the heatsinks you use on the Tyan Tiger, golden orbs do NOT FIT with all the caps surrounding the CPUs. Use square or WIDE (rectangular) heatsinks. A long one or circular one just wont fit. BTW - anyone have experience running non-MP athlons on these boards? I booted it with a couple and ran various jobs (dnetc, gromacs, g98, bunch of compile jobs of said programs as well as a FreeBSD and linux kernel among other things) and I've had no problems yet. /kc > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From mark at markrichman.com Sun Nov 4 09:01:28 2001 From: mark at markrichman.com (Mark A. Richman) Date: Wed Nov 25 01:01:50 2009 Subject: Web based process accounting Message-ID: <000001c16552$5886d6c0$6801a8c0@yoda> Are there any web front ends to PBS or process accounting tools? Thanks, Mark Richman -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011104/9644ee15/attachment.html From zadok at phreaker.net Sun Nov 4 09:31:48 2001 From: zadok at phreaker.net (Hereward Cooper) Date: Wed Nov 25 01:01:50 2009 Subject: Tyan Tiger MP (was: Re: [ot] Re: AMD testing) In-Reply-To: <200111041701.fA4H1E020173@blueraja.scyld.com> References: <200111041701.fA4H1E020173@blueraja.scyld.com> Message-ID: <20011104173148.714fcfea.zadok@phreaker.net> once upon a time (actually it was more like Sun, 4 Nov 2001 12:01:14 -0500), beowulf-request@beowulf.org said: > BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on > the +5V line. The 350W PSs I got do 32A on +5 and work great (and > seem to be of higher quality altogether too). thanks for the tip, shame I didn't know before as I went and bought a 300w one yesteray that only does 25A on the +5v line :-( but atleast it only cost ?15. > Watch out with the heatsinks you use on the Tyan Tiger, golden orbs > do NOT FIT with all the caps surrounding the CPUs. Use square or > WIDE (rectangular) heatsinks. A long one or circular one just wont fit. The Akasa Icicle 765's I got with my mobo work great (or as far as I can currently tell as the machine hasn't been running for more than 20 seconds, but they fit tight and fully cover the chip + more). > BTW - anyone have experience running non-MP athlons on these boards? I > booted it with a couple and ran various jobs (dnetc, gromacs, g98, > bunch of compile jobs of said programs as well as a FreeBSD and linux > kernel among other things) and I've had no problems yet. Sounds promising, did you get any noticable drop in performance? Thanks, Hereward -- What, never seen a signautre file before? From eric at fnordsystems.com Sun Nov 4 10:19:35 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:50 2009 Subject: Tyan Tiger MP (was: Re: [ot] Re: AMD testing) In-Reply-To: <20011104173148.714fcfea.zadok@phreaker.net> Message-ID: $15 power supplies of any variety are invariably garbage... there are plenty of 300W power supplies in the $30-$35 range from larger taiwanese manufacturers that put out 30A on the 5V wire. But then, the price diff between 300 and 350W is often $5, so go with the higher wattage. An excellent Athlon/AthlonXP/AthlonMP cooler is the Dynatron DC1206BM-L, it measures 60x60mm (horizontally) and uses a unique micro-fin design. I've had very good results with it on the 1.53GHz Palomino core CPUs. This HSF costs around $20 each. URL: http://www.dynatron-corp.com/proddetail.asp?cid=6&sku=DC1206BM-L > > BTW - anyone have experience running non-MP athlons on these boards? I > > booted it with a couple and ran various jobs (dnetc, gromacs, g98, > > bunch of compile jobs of said programs as well as a FreeBSD and linux > > kernel among other things) and I've had no problems yet. > > Sounds promising, did you get any noticable drop in performance? I know it's possible to run dual Athlon-C (Thunderbird) 1.4GHz CPUs on the Tyan S2460, but it's not adviseable unless your budget is really limited. The Palominos (AthlonMP/AthlonXP) perform at least 15% better in many FPU-intensive tasks. Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 > -----Original Message----- > From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On > Behalf Of Hereward Cooper > Sent: Sunday, November 04, 2001 9:32 AM > To: beowulf@beowulf.org > Subject: Re: Tyan Tiger MP (was: Re: [ot] Re: AMD testing) > > > once upon a time (actually it was more like Sun, 4 Nov 2001 > 12:01:14 -0500), > beowulf-request@beowulf.org said: > > > > BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on > > the +5V line. The 350W PSs I got do 32A on +5 and work great (and > > seem to be of higher quality altogether too). > > thanks for the tip, shame I didn't know before as I went and > bought a 300w one > yesteray that only does 25A on the +5v line :-( but atleast it > only cost ?15. > > > Watch out with the heatsinks you use on the Tyan Tiger, golden orbs > > do NOT FIT with all the caps surrounding the CPUs. Use square or > > WIDE (rectangular) heatsinks. A long one or circular one just > wont fit. > > The Akasa Icicle 765's I got with my mobo work great (or as far as I can > currently tell as the machine hasn't been running for more than > 20 seconds, but > they fit tight and fully cover the chip + more). > > > Thanks, > > Hereward > > -- What, never seen a signautre file before? > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From jakob at unthought.net Sun Nov 4 12:29:12 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:50 2009 Subject: Compile farm? In-Reply-To: <15331.1311.605294.128939@convex.cs.unb.ca>; from bremner@unb.ca on Fri, Nov 02, 2001 at 04:42:07PM -0400 References: <20011102194454.28360.qmail@web14707.mail.yahoo.com> <15331.1311.605294.128939@convex.cs.unb.ca> Message-ID: <20011104212912.W14001@unthought.net> On Fri, Nov 02, 2001 at 04:42:07PM -0400, David Bremner wrote: > Ron Chen writes: > > What you need is a batch system. > > > > There are 2 free batch systems, SGE and PBS. > > > [good info snipped] > > It is not obvious that a batch system is the best answer to this > particular problem. > > Mosix (www.mosix.org) may be more appropriate for providing a single > system image. I tried this with Mosix. Problem is - mosix migrates jobs after a while. Initially a compiler takes up a few megabytes of memory, but "after a while" it has grown to hundreds of megabytes. When mosix decides to migrate the compiler it will spend a long time on the netowrk to move the large process image. There's some patch to make that integrates it with Mosix, but I didn't try that out. Instead, I implemented http://unthought.net/antsd which will distribute your compilers efficiently to the proper nodes. It's not very sophisticated, but it does the job for me at least :) -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From jnellis at dslextreme.com Sun Nov 4 13:15:50 2001 From: jnellis at dslextreme.com (Joe Nellis) Date: Wed Nov 25 01:01:50 2009 Subject: Scyld distro- Help with examples Message-ID: <001701c16575$e1535b40$73f2a540@dslextreme.com> Greetings, I am writing a tutorial for a recently constructed Scyld beowulf cluster (-7 basic ed.) and I have some questions on the location of include files. Basically I am having users copy the /usr/mpi-beowulf/examples to their home directory and then make/compile them there so they can play with them. My problem comes with the hello++.cc example. The include file is mpi++.h which further asks for other includes in the /usr/include/mpi-beowulf/ directory when they are actually located in the /usr/include/mpi-beowulf/c++/ directory. Were these files supposed to be stuffed into this 'c++' subdirectory for some reason and is it safe to move them up to the parent directory so the example can compile? thanks, Joe Nellis jnellis@dslextreme.com beowulf@cecs.csulb.edu From jnellis at dslextreme.com Sun Nov 4 13:45:21 2001 From: jnellis at dslextreme.com (Joe Nellis) Date: Wed Nov 25 01:01:50 2009 Subject: Using NFS with Scyld (-7 ver.) Message-ID: <002501c1657a$00c4d680$73f2a540@dslextreme.com> Greetings (again), We are having problems getting the nodes to see users home accounts. Our master node mounts an NFS for all /home files. We have changed and uncommented the /etc/beowulf/fstab file so that MASTER = 192.168.10.251, which is the first nic. After rebooting the nodes we did a >> bpsh -a df and saw that the nodes are mounting the master at 192.168.10.1:/home (the second nic). Doing a >> bpsh 4 ls home lists all users but any attempt to get details or dig farther down >> bpsh 4 ls -al home or bpsh 4 ls home/jnellis gives a file or directory not found error. So I am wondering two things, since I am not a networking guy. Do we have the MASTER= in the fstab pointed at the right IP address (I am guessing it shouldn't point directly at the NFS)? Secondly, is there something we are missing that must allow requests for /home files on the nodes to pass THROUGH the master. I ask this because the nodes report mounting /home through nic#2 address and the node fstab is through nic#1 address. thanks, Joe Nellis jnellis@dslextreme.com From agrajag at scyld.com Sun Nov 4 13:57:52 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:50 2009 Subject: Scyld distro- Help with examples In-Reply-To: <001701c16575$e1535b40$73f2a540@dslextreme.com>; from jnellis@dslextreme.com on Sun, Nov 04, 2001 at 01:15:50PM -0800 References: <001701c16575$e1535b40$73f2a540@dslextreme.com> Message-ID: <20011104165752.A26086@blueraja.scyld.com> On Sun, 04 Nov 2001, Joe Nellis wrote: > Greetings, > > I am writing a tutorial for a recently constructed Scyld beowulf cluster (-7 > basic ed.) and I have some questions on the location of include files. > Basically I am having users copy the /usr/mpi-beowulf/examples to their home > directory and then make/compile them there so they can play with them. My > problem comes with the hello++.cc example. The include file is mpi++.h > which further asks for other includes in the /usr/include/mpi-beowulf/ > directory when they are actually located in the > /usr/include/mpi-beowulf/c++/ directory. Were these files supposed to be > stuffed into this 'c++' subdirectory for some reason and is it safe to move > them up to the parent directory so the example can compile? The C++ bindings for MPI do not work in -7. You will have to use -8 if you want C++ to work with MPI. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011104/f439586c/attachment.bin From agrajag at scyld.com Sun Nov 4 14:09:56 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) In-Reply-To: <002501c1657a$00c4d680$73f2a540@dslextreme.com>; from jnellis@dslextreme.com on Sun, Nov 04, 2001 at 01:45:21PM -0800 References: <002501c1657a$00c4d680$73f2a540@dslextreme.com> Message-ID: <20011104170956.A26403@blueraja.scyld.com> On Sun, 04 Nov 2001, Joe Nellis wrote: > shouldn't point directly at the NFS)? Secondly, is there something we are > missing that must allow requests for /home files on the nodes to pass > THROUGH the master. I ask this because the nodes report mounting /home > through nic#2 address and the node fstab is through nic#1 address. As far as I know, you cannot get a linux box to mount an NFS filesystem, then reexport it over NFS to another machine. So, as far as I know, what you're asking is impossible. I might suggest giving people their own home directories on the master and just teaching them how to scp over the files they need. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011104/a5e5e4cb/attachment.bin From ron_chen_123 at yahoo.com Sun Nov 4 15:34:20 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:51 2009 Subject: Web based process accounting In-Reply-To: <000001c16552$5886d6c0$6801a8c0@yoda> Message-ID: <20011104233420.32143.qmail@web14708.mail.yahoo.com> There is a package called PBSWeb, which provides a Web GUI for PBS: http://www.cs.ualberta.ca/~pinchak/PBSWeb/ -Ron --- "Mark A. Richman" wrote: > Are there any web front ends to PBS or process > accounting tools? > > Thanks, > Mark Richman > > > __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From ron_chen_123 at yahoo.com Sun Nov 4 15:40:00 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:51 2009 Subject: Fwd: Re: PBS/Veridian (Re: [PBS-USERS] Re: DRM standard API) Message-ID: <20011104234000.39120.qmail@web14704.mail.yahoo.com> Many closed source companies claim that opensource products do not have support. Indeed, opensource tools do have very nice support. -Ron --- Gabriel Mateescu wrote: > Date: Wed, 31 Oct 2001 11:58:42 -0500 > > From: Gabriel Mateescu > Indeed, Veridian-PBS stands out, due to > a very prompt and competent technical > support. > > Gabriel > > > "Wilbur R. Johnson" wrote: > > > > I have to second this. Being one of the folks who > has sent money to > > Veridian, I am very pleased with the support I > have received. > > > __________________________________________________________________________ > To unsubscribe: email majordomo@openpbs.org with > body "unsubscribe pbs-users" > For message archives: visit > http://openpbs.org/UserArea/pbs-users.html > - - - - - - - - - - > - - - - > Academic Site? Use PBS Pro free, see: > http://www.pbspro.com/academia.html > OpenPBS and the pbs-users mailing list is sponsored > by Veridian. > __________________________________________________________________________ > __________________________________________________________________________ > To unsubscribe: email majordomo@openpbs.org with > body "unsubscribe pbs-users" > For message archives: visit > http://openpbs.org/UserArea/pbs-users.html > - - - - - - - - - - > - - - - > Academic Site? Use PBS Pro free, see: > http://www.pbspro.com/academia.html > OpenPBS and the pbs-users mailing list are sponsored > by Veridian. > __________________________________________________________________________ __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From jtracy at ist.ucf.edu Mon Nov 5 07:52:07 2001 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) In-Reply-To: <20011104170956.A26403@blueraja.scyld.com> Message-ID: On Sun, 4 Nov 2001, Sean Dilda wrote: > On Sun, 04 Nov 2001, Joe Nellis wrote: > > > shouldn't point directly at the NFS)? Secondly, is there something we are > > missing that must allow requests for /home files on the nodes to pass > > THROUGH the master. I ask this because the nodes report mounting /home > > through nic#2 address and the node fstab is through nic#1 address. > > As far as I know, you cannot get a linux box to mount an NFS filesystem, > then reexport it over NFS to another machine. So, as far as I know, > what you're asking is impossible. I believe that you can, you need to enable sun compatibility for that. I have not tested it, but I remember someone saying that you could. > I might suggest giving people their own home directories on the master > and just teaching them how to scp over the files they need. -- Judd Tracy Institute for Simulation and Training jtracy@ist.ucf.edu From alazur at plogic.com Mon Nov 5 09:44:26 2001 From: alazur at plogic.com (Adam Lazur) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) In-Reply-To: <20011104170956.A26403@blueraja.scyld.com> References: <002501c1657a$00c4d680$73f2a540@dslextreme.com> <20011104170956.A26403@blueraja.scyld.com> Message-ID: <20011105124426.B12093@clustermonkey.org> Sean Dilda (agrajag@scyld.com) said: > As far as I know, you cannot get a linux box to mount an NFS filesystem, > then reexport it over NFS to another machine. So, as far as I know, > what you're asking is impossible. Exporting an nfs mount via nfs is possible if you use the user space nfsd (as opposed to the now standard knfsd). The option for this is somewhere in the manpages. -- Adam Lazur Special Forces, Paralogic Inc. From Daniel.Kidger at quadrics.com Mon Nov 5 09:58:26 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D2E1@stegosaurus.bristol.quadrics.com> > As far as I know, you cannot get a linux box to mount an NFS filesystem, > then reexport it over NFS to another machine. So, as far as I know, > what you're asking is impossible. I have never seen re-exporting a directory working. What you can do is routing with ipchains so all nodes on a cluster's private ethernet can mount a filesystem on an external system. Also on the subject I found that auto-mounting /home on demand (see /etc/auto.misc) was much more reliable than trying to keep the mounts permanently up. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From gkogan at students.uiuc.edu Mon Nov 5 10:19:06 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:51 2009 Subject: problems with Scyld Message-ID: Hi. I am having problems with booting up slave nodes. Every time I try to do it I get an error in the state column in the BeoSetup. I looked in the log file for that node and it said " setup_libs: Copying libraries to node 2... tar:lib/ld-2.1.3.so: Cannot write: No space left on device tar: Error exit delayed from previous errors Library copy to node failed. (rootfs=/rootfs)" I cleaned up, deleted most of the partitions, on that node using the fdisk utility from the Windows 98 start up disk. But still gives me the same error. If somebody can help me out I would greatly appreciate it. Thanks From becker at scyld.com Mon Nov 5 11:03:06 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA739D2E1@stegosaurus.bristol.quadrics.com> Message-ID: On Mon, 5 Nov 2001, Daniel Kidger wrote: > > As far as I know, you cannot get a linux box to mount an NFS filesystem, > > then reexport it over NFS to another machine. So, as far as I know, > > what you're asking is impossible. > > I have never seen re-exporting a directory working. It does work: I wrote the original user-level NFS server (unfsd) used by Linux, and re-exporting was one of the primary advantages over the Sun implementation. Having a per-client user ID map was another. > What you can do is routing with ipchains so all nodes on a cluster's private > ethernet can mount a filesystem on an external system. That's a better approach for most clusters, however you can get better caching when using re-export from the master. the NFS consistency problem on writes affects either approach. We recommend only using NFS for small read-only configuration files in /home. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From becker at scyld.com Mon Nov 5 11:05:25 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:51 2009 Subject: problems with Scyld In-Reply-To: Message-ID: On Mon, 5 Nov 2001, german kogan wrote: > I am having problems with booting up slave nodes. Every time I try to do > it I get an error in the state column in the BeoSetup. I looked in the log > file for that node and it said > " setup_libs: Copying libraries to node 2... > tar:lib/ld-2.1.3.so: Cannot write: No space left on device > tar: Error exit delayed from previous errors > Library copy to node failed. (rootfs=/rootfs)" How much memory do you have on the slave nodes? If less than 64MB, you will have to trim the library list. Or better, buy 128MB or 256MB DIMMs which are now the minimum that systems should economically have. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From edwards at icantbelieveimdoingthis.com Mon Nov 5 12:05:27 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:51 2009 Subject: Memory on Scyld systems Message-ID: <20011105130527.A32021@icantbelieveimdoingthis.com> I have a question about memory on AMD-based clusters. I am now running a homogeneous Scyld cluster with 768MB on each node. I have modified the config file with a mem= command and have had no problems. Now I am augmenting the cluster with new nodes that have 1.5 GB of memory on each node (single processor nodes). Is there a way to use a different config file for the new nodes? Also, I have heard that there have been problems with 1.5 GB memory for some systems. Is this a consistent problem? Art Edwards P. S. I'm running Scyld 27Bz-7 From jtracy at ist.ucf.edu Mon Nov 5 11:58:01 2001 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <200111052011.XAA04207@nocserv.free.net> Message-ID: My understanding is the only difference is that they are tested and guaranteed to work in MP configurations. AMD has said that they will not replace processors that do not work in MP configs unless they are MP certified. On Mon, 5 Nov 101, Mikhail Kuzminsky wrote: > Dear colleagues, > > I think about buying of Tyan S2460 motherboards for Beowulf. > According the data I have, Athlon XP (Palomino core) microprocessors > can work successfully w/this mobos. > > But there is also Athlon MP microprocessors w/same Palomino core > w/same OPGA package w/same voltages and w/same frequencies beginning > from 1333 (1500+). They costs, as I understand, higher than corresponding > MP models. > > Sorry, what is the difference between MP and XP chips ? Both, > if my source was correct, supports cache coherence. > > Yours > Mikhail Kuzminsky > Zelinsky Institute of Organic Chemistry > Moscow > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Judd Tracy Institute for Simulation and Training jtracy@ist.ucf.edu From math at velocet.ca Mon Nov 5 13:35:40 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: ; from jtracy@ist.ucf.edu on Mon, Nov 05, 2001 at 02:58:01PM -0500 References: <200111052011.XAA04207@nocserv.free.net> Message-ID: <20011105163540.V27471@velocet.ca> On Mon, Nov 05, 2001 at 02:58:01PM -0500, Judd Tracy's all... > > My understanding is the only difference is that they are tested and > guaranteed to work in MP configurations. AMD has said that they will not > replace processors that do not work in MP configs unless they are MP > certified. Athlon contracts out to do installations and guarantees them to this degree? Thats gotta be a pretty penny. /kc > > On Mon, 5 Nov 101, Mikhail Kuzminsky wrote: > > > Dear colleagues, > > > > I think about buying of Tyan S2460 motherboards for Beowulf. > > According the data I have, Athlon XP (Palomino core) microprocessors > > can work successfully w/this mobos. > > > > But there is also Athlon MP microprocessors w/same Palomino core > > w/same OPGA package w/same voltages and w/same frequencies beginning > > from 1333 (1500+). They costs, as I understand, higher than corresponding > > MP models. > > > > Sorry, what is the difference between MP and XP chips ? Both, > > if my source was correct, supports cache coherence. > > > > Yours > > Mikhail Kuzminsky > > Zelinsky Institute of Organic Chemistry > > Moscow > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Judd Tracy > Institute for Simulation and Training > jtracy@ist.ucf.edu > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From eric at fnordsystems.com Mon Nov 5 14:41:38 2001 From: eric at fnordsystems.com (Eric Kuhnke) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: Message-ID: Of course, if an AthlonXP used in a dual-processor board ever dies, you can always say to AMD "Yes, we were using it in a #NAME_OF_SINGLE_PROCESSOR_MOTHERBOARD, and it randomly died". It's pretty rare for a CPU to fail by itself, 99.9% of the time it's the result of a heatsink fan failing, or the heatsink somehow coming loose from the socket. the 00.1% is power surges, lightning strikes, and things like that... Eric Kuhnke Lead Engineer / Operations Manager Fnord Datacenter Systems Inc. eric@fnordsystems.com www.fnordsystems.com voice: +1-360-527-3301 > -----Original Message----- > From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On > Behalf Of Judd Tracy > Sent: Monday, November 05, 2001 11:58 AM > To: Mikhail Kuzminsky > Cc: beowulf@beowulf.org > Subject: Re: Athlon MP vs Athlon XP > > > > My understanding is the only difference is that they are tested and > guaranteed to work in MP configurations. AMD has said that > they will not > replace processors that do not work in MP configs unless they are MP > certified. > > On Mon, 5 Nov 101, Mikhail Kuzminsky wrote: > > > Dear colleagues, > > > > I think about buying of Tyan S2460 motherboards for Beowulf. > > According the data I have, Athlon XP (Palomino core) microprocessors > > can work successfully w/this mobos. > > > > But there is also Athlon MP microprocessors w/same Palomino core > > w/same OPGA package w/same voltages and w/same frequencies beginning > > from 1333 (1500+). They costs, as I understand, higher than > corresponding > > MP models. > > > > Sorry, what is the difference between MP and XP chips ? Both, > > if my source was correct, supports cache coherence. > > > > Yours > > Mikhail Kuzminsky > > Zelinsky Institute of Organic Chemistry > > Moscow > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Judd Tracy > Institute for Simulation and Training > jtracy@ist.ucf.edu > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From edwards at icantbelieveimdoingthis.com Mon Nov 5 14:50:37 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:51 2009 Subject: Problems with Scyld Message-ID: <20011105155037.A32324@icantbelieveimdoingthis.com> am attempting to install 16 new nodes on an existing Scyld network (Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet cards. When I attempt to use the standard Scyld tools, boot the slave node, drag the new MAC address to the list and click apply, nothing happens. The slave node continues to issue RARP attempts. When I build one of the new nodes into a head node and attempt the same process, the MAC address does not appear in the new addresses column. It seems as if the new ethernet cards can send, but not receive. Any help would be apreciated. Art Edwards From becker at scyld.com Mon Nov 5 15:53:31 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:51 2009 Subject: Problems with Scyld In-Reply-To: <20011105155037.A32324@icantbelieveimdoingthis.com> Message-ID: On Mon, 5 Nov 2001, Art Edwards wrote: > am attempting to install 16 new nodes on an existing Scyld network > (Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet > cards. When I attempt to use the standard Scyld tools, boot the slave node, > drag the new MAC address to the list and click apply, nothing happens. The > slave node continues to issue RARP attempts. When I build one of the new > nodes into a head node and attempt the same process, the MAC address does > not appear in the new addresses column. It seems as if the new ethernet cards > can send, but not receive. The new "CX" cards require an updated driver. The update is in 27*z-8, or you can compile the driver update set from ftp://ftp.scyld.com/pub/network/netdrivers-3.0-1.src.rpm You'll have to create new boot images and second stage images. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From kalyanakrishna at yahoo.com Tue Nov 6 01:51:11 2001 From: kalyanakrishna at yahoo.com (Chadalavada Kalyana Krishna) Date: Wed Nov 25 01:01:51 2009 Subject: ch_p4 Error -> System Hangs Message-ID: <20011106095111.19316.qmail@web10507.mail.yahoo.com> Hello all, I am working on a 7 node Linux Cluster ( 6 compute nodes , 1 FS). I tried to run simple Hello World Program. The C Program went through with out any glitches. When I tried the same in FORTRAN, the system from which the program was started, hung. I could not trace out the source to any s/w problem or installation, though I am not sure about it. Repeated attempts to run the same resulted in hanging of n09, n11, n13,n14, n15. I was not able to Ping to the systems. But, I also do not understand why n10 did not hang though I ran the program there too. Ths display is : Code: some numbres. Alicee: Killed Interrupt handler Kernel Panic: Interrupt Handler not syncing One important point is that we have configured mpich to use ssh instead of rsh for communication. with reagrds, Kalyan.Ch ===== ------------------------------------------------------------ Ch.Kalyana Krishna, Parallel Processing Group, National PARAM Super Computing Facility, Center for Development of Advanced Computing, Pune University Campus,Pune - 411 007, India. Ph: Off:+91-20-5694080 Res: +91-20-589255 __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From edwards at icantbelieveimdoingthis.com Tue Nov 6 07:30:19 2001 From: edwards at icantbelieveimdoingthis.com (Art Edwards) Date: Wed Nov 25 01:01:51 2009 Subject: Problems with Scyld In-Reply-To: ; from becker@scyld.com on Mon, Nov 05, 2001 at 06:53:31PM -0500 References: <20011105155037.A32324@icantbelieveimdoingthis.com> Message-ID: <20011106083019.B1353@icantbelieveimdoingthis.com> Thanks very much for the reply. I'm trying to blend AMD nodes with 1.5 G of CMOS memory with existing AMD nodes with .75 G. There is a config file in /etc/beowulf that feeds either the second or third boot phase that contains a mem= command. Is there a way within Scyld to specify different config file for different nodes? Art Edwards On Mon, Nov 05, 2001 at 06:53:31PM -0500, Donald Becker wrote: From becker at scyld.com Tue Nov 6 07:55:35 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:51 2009 Subject: ch_p4 Error -> System Hangs In-Reply-To: <20011106095111.19316.qmail@web10507.mail.yahoo.com> Message-ID: On Tue, 6 Nov 2001, Chadalavada Kalyana Krishna wrote: > I am working on a 7 node Linux Cluster ( 6 compute > nodes , 1 FS). What system? (Kernel version, etc.) > system from which the program was started, hung. I > could not trace out the source to any s/w problem or > installation, though I am not sure about it. > > Repeated attempts to run the same resulted in hanging > of n09, n11, n13,n14, n15. I was not able to Ping to > the systems. But, I also do not understand why n10 did > not hang though I ran the program there too. > > Ths display is : > > Code: some numbres. > > Alicee: Killed Interrupt handler You have a kernel crash. Given that it didn't occur on all systems, you should look first for a hardware problem, especially memory corruption. > One important point is that we have configured mpich > to use ssh instead of rsh for communication. This is likely not related to a kernel crash. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From agrajag at scyld.com Tue Nov 6 08:16:36 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:51 2009 Subject: Memory on Scyld systems In-Reply-To: <20011105130527.A32021@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Mon, Nov 05, 2001 at 01:05:27PM -0700 References: <20011105130527.A32021@icantbelieveimdoingthis.com> Message-ID: <20011106111636.A28908@blueraja.scyld.com> On Mon, 05 Nov 2001, Art Edwards wrote: > I have a question about memory on AMD-based clusters. I am now running > a homogeneous Scyld cluster with 768MB on each node. I have modified the > config file with a mem= command and have had no problems. Now I am augmenting > the cluster with new nodes that have 1.5 GB of memory on each node (single > processor nodes). Is there a way to use a different config file for the new > nodes? The kernel commandline is stored in the bootfile /var/beowulf/boot.img By default, this image is sent to all nodes, but if the file /var/beowulf/boot.img. (ie boot.img.0) exists, it will use that image for the given node. You can create this image by modifying /etc/beowulf/config.boot, then running beoboot -2 -n -o /var/beowulf/boot.img. Once you've created one copy of the new bootfile, you should be able to symlink or hardlink the other filenames to it so that boot.img. for all your new nodes points to it. Just remember, whenever you change anything with your bootfiles in the future, you're going to have to remake both files. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011106/88b9c747/attachment.bin From snguyen at hotmail.com Tue Nov 6 22:08:38 2001 From: snguyen at hotmail.com (Son Nguyen) Date: Wed Nov 25 01:01:51 2009 Subject: problems with scyld - slave nodes Message-ID: >Message: 4 >Date: Mon, 5 Nov 2001 12:19:06 -0600 (CST) >From: german kogan >To: >Subject: problems with Scyld > > > >Hi. > >I am having problems with booting up slave nodes. Every time I try to do >it I get an error in the state column in the BeoSetup. I looked in the log >file for that node and it said >" setup_libs: Copying libraries to node 2... >tar:lib/ld-2.1.3.so: Cannot write: No space left on device >tar: Error exit delayed from previous errors >Library copy to node failed. (rootfs=/rootfs)" > >I cleaned up, deleted most of the partitions, on that node using the fdisk >utility from the Windows 98 start up disk. But still gives me the same >error. If somebody can help me out I would greatly appreciate it. > >Thanks German, it is not ram. it is partition allocation. your / partition is not enough. here is a suggestion fat 50mb beoboot swap 256mb swap / rest(1.4gig) rest I have also found out that on certain testing of the filesystem, I can load 100% of the / partition. After a reboot, the slave node does not allow full active state due to lack of space issues. Good luck Sonny Nguyen Senior Networking and Distributed Systems Engineer The Mitre Corporation >Message: 12 >Date: Mon, 5 Nov 2001 15:50:37 -0700 >To: beowulf@beowulf.org >Subject: Problems with Scyld >From: Art Edwards > > am attempting to install 16 new nodes on an existing Scyld network >(Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet >cards. When I attempt to use the standard Scyld tools, boot the slave node, >drag the new MAC address to the list and click apply, nothing happens. The >slave node continues to issue RARP attempts. When I build one of the new >nodes into a head node and attempt the same process, the MAC address does >not appear in the new addresses column. It seems as if the new ethernet >cards >can send, but not receive. > >Any help would be apreciated. > >Art Edwards > > Art, 1) there is something wrong with the server. 2) take a look to see if you see a /var/beowulf to see if the file unknown_addresses exist. if not touch the file and retry the client node. Good Luck Sonny Nguyen Senior Networking and Distributed Systems Engineer The Mitre Corporation > > > Here is a new question. I have just received 27cz-8a. Built the server. When trying to boot the slave nodes, the server sees it, accept it and distribute ip for the client without any intervention. The slave node then failed to do the second boot phase...with the error... neighbor table overflow.... this is a fresh install. I have never seen this on 27bz-7 Sonny Nguyen _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp From agrajag at scyld.com Tue Nov 6 14:24:55 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:51 2009 Subject: problems with scyld - slave nodes In-Reply-To: ; from snguyen@hotmail.com on Tue, Nov 06, 2001 at 10:08:38PM +0000 References: Message-ID: <20011106142455.C19207@kotako.analogself.com> On Tue, 06 Nov 2001, Son Nguyen wrote: > >Message: 4 > >Date: Mon, 5 Nov 2001 12:19:06 -0600 (CST) > >From: german kogan > >To: > >Subject: problems with Scyld > > > > > > > >Hi. > > > >I am having problems with booting up slave nodes. Every time I try to do > >it I get an error in the state column in the BeoSetup. I looked in the log > >file for that node and it said > >" setup_libs: Copying libraries to node 2... > >tar:lib/ld-2.1.3.so: Cannot write: No space left on device > >tar: Error exit delayed from previous errors > >Library copy to node failed. (rootfs=/rootfs)" > > > >I cleaned up, deleted most of the partitions, on that node using the fdisk > >utility from the Windows 98 start up disk. But still gives me the same > >error. If somebody can help me out I would greatly appreciate it. > > > >Thanks > > German, > > it is not ram. it is partition allocation. your / partition is not enough. > here is a suggestion > > fat 50mb beoboot > swap 256mb swap > / rest(1.4gig) rest > > I have also found out that on certain testing of the filesystem, I can load > 100% of the / partition. After a reboot, the slave node does not allow full > active state due to lack of space issues. The problem is with the / partition. However, on a fresh install of Scyld Beowulf, the / partition is a ram disk, which means running out of RAM may be the problem. It won't try to put anything on the harddrive of the slave node until you bring the nodes up with ramdisks and partition them, then change the /etc/beowulf/fstab -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011106/e68ccdaa/attachment.bin From javier.iglesias at freesurf.ch Wed Nov 7 07:11:31 2001 From: javier.iglesias at freesurf.ch (Javier Iglesias) Date: Wed Nov 25 01:01:51 2009 Subject: ExtremeNetworks Summit and channel bonding with Scyld Message-ID: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch> Hi all, We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster for academic research in the field of genetic programming, and large neural networks. We'd like to use an Extreme Networks Summit 48 ethernet switch -> http://www.extremenetworks.com/products/datasheets/summit24.asp connecting (highly recommended here recently :) Netgear FA310TX NICs -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 Here come the questions : * has anyone experienced channel bonding on that switch ? * any Gigabit NIC recommandation for the master node ? * is it possible/necessary to channel bond Gigabit interfaces ? Thanks in advance for your help !! --javier -- Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus! http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1 From Kian_Chang_Low at vdgc.com.sg Wed Nov 7 08:01:48 2001 From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low@vdgc.com.sg) Date: Wed Nov 25 01:01:51 2009 Subject: Promise SX6000 vs Adaptec 2400A Message-ID: Hi all, With 3ware existing the IDE raid storage card market, I was looking for a replacement for a cluster and came across the Promise Supertrak SX6000 and Adaptec ATA RAID 2400A. I have no experience with the above and hope someone might shed some light. 1) Which card has the better support for Linux? I had heard that Promise is not very Linux-friendly and tend to lock the user to older kernel. Is that true? 2) Does anyone has experience putting more than 1 Promise card on a system? Is it possible? 3) Is there any other alternatives? Thanks, Kian Chang. From cblack at eragen.com Wed Nov 7 08:17:29 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:51 2009 Subject: ExtremeNetworks Summit and channel bonding with Scyld In-Reply-To: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>; from javier.iglesias@freesurf.ch on Wed, Nov 07, 2001 at 04:11:31PM +0100 References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch> Message-ID: <20011107111729.B7496@getafix.EraGen.com> On Wed, Nov 07, 2001 at 04:11:31PM +0100, Javier Iglesias wrote: > Hi all, > > We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster > for academic research in the field of genetic programming, and > large neural networks. > > We'd like to use an Extreme Networks Summit 48 ethernet switch > -> http://www.extremenetworks.com/products/datasheets/summit24.asp > connecting (highly recommended here recently :) Netgear FA310TX NICs > -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 > > Here come the questions : > * has anyone experienced channel bonding on that switch ? > * any Gigabit NIC recommandation for the master node ? > * is it possible/necessary to channel bond Gigabit interfaces ? > > Thanks in advance for your help !! I have no experience with that switch, but have a few comments... For gigabit NICs for linux, we have had good experience with the NetGear GA620 cards (not the GA622s which are a different chipset). They are well supported by the acenic driver and function well for us. As for channel bonding, I really don't think you'll need it for genetic programming and neural networks as those aren't traditionally very-high-bandwidth applications if I am thinking about them correctly. (That is if by genetic programming you mean genetic algorithms and not bioinformatics). Not to mention the added complexity and time needed to implemenet channel bonding, I just don't think it would be worth it in this case. It seems to me that many cluster workloads work fine with just fast ethernet to the nodes. Chris -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011107/9d2db5a0/attachment.bin From rgb at phy.duke.edu Wed Nov 7 08:26:39 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:51 2009 Subject: RDRAM vs SDRAM redux Message-ID: Dear List Humans, Life continues to get more puzzling all the time. We are working out final configurations for a mixed purchase of P4's and Athlon XP's. Or so I thought when I started to review the hardware alternatives this morning. I'm basically getting ready to update a quote from three months ago but the world has of course changed substantially in the meantime. The Athlon update was fairly easy. It looks like the KT266A chipset is probably the one of choice for a single CPU solution (which I'm inclined to) and in the meantime 512 MB DDR PC2100 DIMMS are now cheaper than 256 DIMMS were in the first quote. Also choosing the XP for a single CPU choice is a no-brainer. The P4's are much more difficult because there are now SDRAM chipsets. Does anyone have words of wisdom (or benchmarks!) to offer for the performance of P4's running e.g. lattice QCD or other numbers, especially those illustrating differences between code that uses SSE instructions? I already found http://qcdhome.fnal.gov/cluster_design/benchmarks.html but it is a bit dated (being all of five months old:-) and doesn't include KT266A and XP OR SDRAM-equipped P4's. I'm especially interested on what the best choice would be for a P4 intended to do well on memory-intensive code, e.g. Intel 845 (SDRAM but CPU up to 2 GHz) or 850 (RDRAM but only 1.8 GHz?) or SiS 645 (DDR up to 2 GHz) as there are getting to be a truly dazzling array of alternatives. An obvious question is whether or not our lattice QCD folks and/or quark-gluon plasma folks really need to get the P4's to hedge their bets at this point. The benchmark results above at FNAL show the P4 holding a small (~20%) lead over the Palomino out in the large lattice sizes likely to be dominated by memory speed. The stream results for the P4, especially with SSE instructions, still are much better for the P4 than (say) the Palomino, but the KT266A suppposedly delivers 20-30% better DDR performance than the KT266 did (and maybe than the AMD 760 used on the Tyan Thunder?). There is also no clear indication on whether using an SSE compiler with the XP makes a difference -- does the XP support SSE1 and/or SSE2 instructions? Sigh. Any help on these questions would be greatly appreciated. Also, if the FNAL folks are listening and have some newer boxes handy, it would be fabulous of you to update your benchmarks above. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jlb17 at duke.edu Wed Nov 7 08:46:22 2001 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed Nov 25 01:01:51 2009 Subject: Promise SX6000 vs Adaptec 2400A In-Reply-To: Message-ID: On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote > With 3ware existing the IDE raid storage card market, I was looking for a > replacement for a cluster and came across the Promise Supertrak SX6000 and > Adaptec ATA RAID 2400A. FWIW, someone with a *lot* of interest in big storage systems recently posted to the linux-ide-arrays list that 3ware have reversed their decision and will be getting back into the IDE raid card business (including releasing the 7850 RSN). A press release is supposed to be forthcoming. The Escalade pages are already back up. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From jtracy at ist.ucf.edu Wed Nov 7 07:52:00 2001 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Wed Nov 25 01:01:51 2009 Subject: ExtremeNetworks Summit and channel bonding with Scyld In-Reply-To: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch> Message-ID: On Wed, 7 Nov 2001, Javier Iglesias wrote: > Hi all, > > We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster > for academic research in the field of genetic programming, and > large neural networks. > > We'd like to use an Extreme Networks Summit 48 ethernet switch > -> http://www.extremenetworks.com/products/datasheets/summit24.asp > connecting (highly recommended here recently :) Netgear FA310TX NICs > -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 > > Here come the questions : > * has anyone experienced channel bonding on that switch ? I am having a representative from extreme bring a switch by our lab to test out chanel bonding. > * any Gigabit NIC recommandation for the master node ? > * is it possible/necessary to channel bond Gigabit interfaces ? You can, but you might not get much benefit. Make sure that you are using 64 bit cards because the 32 bit pci bus can't really handle two cards. > Thanks in advance for your help !! > > --javier > > -- > Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus! > http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1 > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Judd Tracy Institute for Simulation and Training jtracy@ist.ucf.edu From Florent.Calvayrac at univ-lemans.fr Wed Nov 7 10:32:49 2001 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Wed Nov 25 01:01:51 2009 Subject: ExtremeNetworks Summit and channel bonding with Scyld References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch> Message-ID: <3BE97E51.798825CF@univ-lemans.fr> > We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster > > > > Here come the questions : > * has anyone experienced channel bonding on that switch ? This is irrelevant to your question, and maybe you are already aware of the problem, but using channel bonding with ( at least 1 $ / version 7) Scyld requires some modifications to beoboot and to the kernel, of course. We managed to get a solution that works, in case anyone is interested... -- Florent Calvayrac UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 From brian at chpc.utah.edu Wed Nov 7 11:40:55 2001 From: brian at chpc.utah.edu (Brian Haymore) Date: Wed Nov 25 01:01:51 2009 Subject: Promise SX6000 vs Adaptec 2400A References: Message-ID: <3BE98E47.5060705@chpc.utah.edu> FYI, 3Ware will be announcing shorting that Escalade is not in fact going away. So those that did like the 3ware product this is great news. Joshua Baker-LePain wrote: > On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote > > >>With 3ware existing the IDE raid storage card market, I was looking for a >>replacement for a cluster and came across the Promise Supertrak SX6000 and >>Adaptec ATA RAID 2400A. >> > > FWIW, someone with a *lot* of interest in big storage systems recently > posted to the linux-ide-arrays list that 3ware have reversed their > decision and will be getting back into the IDE raid card business > (including releasing the 7850 RSN). A press release is supposed to be > forthcoming. The Escalade pages are already back up. > > -- Brian D. Haymore University of Utah Center for High Performance Computing 155 South 1452 East RM 405 Salt Lake City, Ut 84112-0190 Email: brian@chpc.utah.edu - Phone: (801) 585-1755 - Fax: (801) 585-5366 From yoon at bh.kyungpook.ac.kr Wed Nov 7 23:11:01 2001 From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho) Date: Wed Nov 25 01:01:51 2009 Subject: HPL residual check failure References: Message-ID: <004401c16824$8a17a000$5f72f2cb@LocalHost> I found your e-mail today. I can't find your system information in your e-mail. but I guess if you use Myrinet instead of 10/100 LAN. then please check the Cable & Myrinet mpich version. If you use 10/100 LAN, then I guess your failure for the matrix size 23,000 is related to RAM . Please check the RAM size & Physical Problems of your Workstations RAM first. There may be problems in the heat but I think in the normal temperature, Workstation must be endured without fail. Have a NIce Day. --------------------------------------------------------------------- Yoon Jae Ho Economist POSCO Research Institute yoon@bh.kyungpook.ac.kr jhyoon@mail.posri.re.kr http://ie.korea.ac.kr/~supercom/ Korea Beowulf Supercomputer http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D Help the people with your PC Imagination is more important than knowledge. A. Einstein "??????? ??? ???" ??? ??, " ??? ??? ??" ?? ??? ??(???? ???? ???? ??) "????? '???? ????? ??'??? ? ? ???, ??? ??? ????? ??? ????." ?? ??? "??? ?? ??? ??? ??? ??? ??? ??" ??? 2000.4.22 "???? ???? ?? ??? ??? ??? ????" ? ?? 2000.4.29 "???? ??? ??? ??? ??? ????" ? ?? 2000.4.24 http://www.kichun.co.kr 2001.1.6 http://www.c3tv.com 2001.1.10 ------------------------------------------------------------------------ ----- Original Message ----- From: ??? To: Sent: Monday, September 03, 2001 9:22 PM Subject: HPL residual check failure > Hi > When I was doing HPL benchmark test using big matrix(bigger than 20,000 ) with many linux server(more than 20), sometimes I got residual check error as attached. > When I got residual check error, I turned off my linux servers for several hours and then tried again. And usually it worked - I don't know the reason. > Heat is suspicious. But, is it really heat problem? > Is there anybody who have experienced similar problem or know the reason? > please help me. > > Thanks in advance! > > Keaton > > > HPL result files------------------------------------------------------------ > > ============================================================================ > T/V N NB P Q Time Gflops > ---------------------------------------------------------------------------- > W11R2C4 21000 200 6 6 702.80 8.786e+00 > ---------------------------------------------------------------------------- > ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0272768 ...... PASSED > ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0140749 ...... PASSED > ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0026585 ...... PASSED > ============================================================================ > T/V N NB P Q Time Gflops > ---------------------------------------------------------------------------- > W11R2C4 23000 200 6 6 866.35 9.364e+00 > ---------------------------------------------------------------------------- > ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 3255.3898794 ...... FAILED > ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 7833.1904572 ...... FAILED > ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1364.3123654 ...... FAILED > ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.000049 > ||A||_oo . . . . . . . . . . . . . . . . . . . = 5827.145943 > ||A||_1 . . . . . . . . . . . . . . . . . . . = 5836.795619 > ||x||_oo . . . . . . . . . . . . . . . . . . . = 2.390054 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From Kian_Chang_Low at vdgc.com.sg Thu Nov 8 00:04:44 2001 From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low@vdgc.com.sg) Date: Wed Nov 25 01:01:51 2009 Subject: Promise SX6000 vs Adaptec 2400A Message-ID: Hi, Thanks for the info. I was wondering whether anyone in the list has experience with either the Promise or the Adaptec cards and would like to share their experience? Especially with drivers for the newer kernel. Thanks, Kian Chang. Joshua Baker-LePain To: > Subject: Re: Promise SX6000 vs Adaptec Sent by: 2400A beowulf-admin@b eowulf.org 11/08/01 12:46 AM On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote > With 3ware existing the IDE raid storage card market, I was looking for a > replacement for a cluster and came across the Promise Supertrak SX6000 and > Adaptec ATA RAID 2400A. FWIW, someone with a *lot* of interest in big storage systems recently posted to the linux-ide-arrays list that 3ware have reversed their decision and will be getting back into the IDE raid card business (including releasing the 7850 RSN). A press release is supposed to be forthcoming. The Escalade pages are already back up. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From leunen.d at fsagx.ac.be Thu Nov 8 03:56:05 2001 From: leunen.d at fsagx.ac.be (David Leunen) Date: Wed Nov 25 01:01:51 2009 Subject: Scyld iso image Message-ID: <3BEA72D5.FA062766@fsagx.ac.be> Hello, Does anyone of you know a ftp site where I can found the iso image of the latest scyld? I really can't wait the CD from linux central (and it is an older version). I have a pretty fast connection and it shouldn't be long. I will very much appreciate if you provide it for me. You can answer me to my personal e-mail or on this mail-list. Thank you. David From patrick at myri.com Thu Nov 8 03:53:08 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:51 2009 Subject: HPL residual check failure References: <004401c16824$8a17a000$5f72f2cb@LocalHost> Message-ID: <3BEA7224.C9C36C12@myri.com> Yoon Jae Ho wrote: > but I guess if you use Myrinet instead of 10/100 LAN. then please check the Cable & Myrinet mpich version. FYI, bad Myrinet cables do not produced corrupted data, there is a hardware CRC check on the NIC. Corrupted packets are just dropped, so symptoms of bad cables are messages timing out or very slow. You can look at the number of bad CRCs (badcrc_cnt) with " gm_counters" (if you are using GM). In the context of Keaton's failure, bad memory is certainely the problem. Usually, if things works after cooling the unit, it's very likely to be overheating hardware. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From j.c.burton at gats-inc.com Thu Nov 8 07:39:19 2001 From: j.c.burton at gats-inc.com (John Burton) Date: Wed Nov 25 01:01:51 2009 Subject: Multiple Promise Ultra 100 TX2 controllers... Message-ID: <3BEAA727.3F425C33@gats-inc.com> Greetings! Over the past month, I've been trying to build a 500GB ATA/100 RAID 5 array and have encountered multiple problems along the way. My system consists of: * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit nic (eepro100) onboard. * 2 1GHz PIII processors w/ 512MB memory. * 9GB Quantum Atlas Ultra 160 SCSI system disk. * 100GB Seagate AIT tape Autoloader. * Seagate DDS-3 4mm tape drive. * 6 x 100GB Western Digital ATA/100 disks * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area * RedHat 7.2 The latest problem I've been having is with multiple Promise Ultra 100 tx2 controllers - with 6 disks, I need 6 IDE channels which means 3 Ultra 100 controllers. I had purchased one tx2 earlier this year (early spring) and just this past week purchased 2 more. I installed them and connected them to the 6 disks. When I booted the machine, I got the Promise Ultra BIOS screen detecting the drives, and then it displays a list of 8 possible drives (D0 - D7). D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do not have any disks listed (this is expected since I'm only using 1 master drive per channel). What is not expected is that there are only 8 possible drives listed. With 3 controllers, there should be 12 possible drives with 6 drives detected. When Linux started booting, I noticed that all 3 controllers and 6 disks were detected. So far so good. When the kernel started checking for partitions on the disks, it ran into trouble (last two disks giving DMA errors). Below is the appropriate log entries showing what happened. According to the logs it looks like there is a problem with either the 3rd controller or the last 2 disks. I rearranged the order of the controllers (i.e. swapped which cards were installed in which slots) and left the order of the disks the same (first two disks attached to the controller in the first PCI slot, etc). And got the same results (last two disks showing DMA errors). I then changed the order of the disks relative to the PCI slots and still got the same results (last two disks giving DMA errors). I then removed one controller at a time (leaving 2 installed at any one time) and connected various combinations of 4 disks from the available 6. Everything worked fine, with no errors. At this point I'm kinda stuck with the conclusion that only 2 Promise Ultra100 TX2 cards will work in that system at one time. Does anyone have any suggestions? thoughts? help? Hopefully waiting, John SYSLOG Entries: Nov 7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00 dev 20 Nov 7 13:49:48 oracle kernel: PDC20268: chipset revision 1 Nov 7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will probe irqs later Nov 7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaf8000 Nov 7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit DISABLED Primary PCI Mode Secondary MASTER Mode. Nov 7 13:49:48 oracle kernel: ide2: BM-DMA at 0xdf90-0xdf97, BIOS settings: hde:pio, hdf:pio Nov 7 13:49:48 oracle kernel: ide3: BM-DMA at 0xdf98-0xdf9f, BIOS settings: hdg:pio, hdh:pio Nov 7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00 dev 18 Nov 7 13:49:48 oracle kernel: PDC20268: chipset revision 1 Nov 7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will probe irqs later Nov 7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaec000 Nov 7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED Primary MASTER Mode Secondary MASTER Mode. Nov 7 13:49:48 oracle kernel: ide4: BM-DMA at 0xdf60-0xdf67, BIOS settings: hdi:pio, hdj:pio Nov 7 13:49:48 oracle kernel: ide5: BM-DMA at 0xdf68-0xdf6f, BIOS settings: hdk:pio, hdl:pio Nov 7 13:49:49 oracle kernel: PDC20268: IDE controller on PCI bus 00 dev 10 Nov 7 13:49:49 oracle kernel: PDC20268: chipset revision 1 Nov 7 13:49:49 oracle kernel: PDC20268: not 100%% native mode: will probe irqs later Nov 7 13:49:49 oracle kernel: PDC20268: ROM enabled at 0xfeae4000 Nov 7 13:49:49 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED Primary MASTER Mode Secondary MASTER Mode. Nov 7 13:49:49 oracle kernel: ide6: BM-DMA at 0xdf30-0xdf37, BIOS settings: hdm:pio, hdn:pio Nov 7 13:49:49 oracle kernel: ide7: BM-DMA at 0xdf38-0xdf3f, BIOS settings: hdo:pio, hdp:pio Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: chipset revision 0 Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: not 100%% native mode: will probe irqs later Nov 7 13:49:49 oracle kernel: ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio Nov 7 13:49:49 oracle kernel: ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio Nov 7 13:49:49 oracle kernel: hdc: CD-ROM CDU311, ATAPI CD/DVD-ROM drive Nov 7 13:49:49 oracle kernel: hde: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: hdg: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: hdi: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: hdk: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: hdm: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: hdo: WDC WD1000BB-00CCB0, ATA DISK drive Nov 7 13:49:49 oracle kernel: ide1 at 0x170-0x177,0x376 on irq 15 Nov 7 13:49:49 oracle kernel: ide2 at 0xdff0-0xdff7,0xdfe6 on irq 22 Nov 7 13:49:49 oracle kernel: ide3 at 0xdfa8-0xdfaf,0xdfe2 on irq 22 Nov 7 13:49:49 oracle kernel: ide4 at 0xdfa0-0xdfa7,0xdf8e on irq 20 Nov 7 13:49:49 oracle kernel: ide5 at 0xdf80-0xdf87,0xdf8a on irq 20 Nov 7 13:49:49 oracle kernel: ide6 at 0xdf58-0xdf5f,0xdf7e on irq 18 Nov 7 13:49:49 oracle kernel: ide7 at 0xdf50-0xdf57,0xdf4e on irq 18 Nov 7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: hde: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, (U)DMA Nov 7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: hdg: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, (U)DMA Nov 7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: hdi: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, UDMA(100) Nov 7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:49 oracle kernel: hdk: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, UDMA(100) Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: hdm: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, UDMA(100) Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: hdo: 195371568 sectors (100030 MB) w/2048KiB Cache, CHS=193821/16/63, UDMA(100) Nov 7 13:49:50 oracle kernel: ide-floppy driver 0.97.sv Nov 7 13:49:50 oracle kernel: Partition check: Nov 7 13:49:50 oracle kernel: hde: [PTBL] [12161/255/63] hde1 Nov 7 13:49:50 oracle kernel: hdg: [PTBL] [12161/255/63] hdg1 Nov 7 13:49:50 oracle kernel: hdi: [PTBL] [12161/255/63] hdi1 Nov 7 13:49:50 oracle kernel: hdk: [PTBL] [12161/255/63] hdk1 Nov 7 13:49:50 oracle kernel: hdm:hdm: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 7 13:49:50 oracle kernel: hdm: dma_intr: status=0x51 { DriveReady SeekComplete Error } Nov 7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 { DriveStatusError BadCRC } Nov 7 13:49:50 oracle kernel: hdm: timeout waiting for DMA Nov 7 13:49:50 oracle kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: [PTBL] [12161/255/63] hdm1 Nov 7 13:49:50 oracle kernel: hdo:hdo: timeout waiting for DMA Nov 7 13:49:50 oracle kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb (mask 0xffffffff) Nov 7 13:49:50 oracle kernel: [PTBL] [12161/255/63] hdo1 From lindahl at conservativecomputer.com Thu Nov 8 07:57:36 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) In-Reply-To: ; from becker@scyld.com on Mon, Nov 05, 2001 at 02:03:06PM -0500 Message-ID: <20011108105736.B12344@wumpus.foo> On Mon, Nov 05, 2001 at 02:03:06PM -0500, Donald Becker wrote: > It does work: I wrote the original user-level NFS server (unfsd) used by > Linux, and re-exporting was one of the primary advantages over the Sun > implementation. Not only does re-exporting work, it works well enough that the CPlant people use it for their single system disk, which is shared by more than 1,000 nodes. There's 1 node per rack which mounts the one disk and re-exports it to the other nodes in the rack. The extra caching is very important to them. greg From lindahl at conservativecomputer.com Thu Nov 8 07:58:14 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: Compile farm? In-Reply-To: <20011104212912.W14001@unthought.net>; from jakob@unthought.net on Sun, Nov 04, 2001 at 09:29:12PM +0100 Message-ID: <20011108105814.C12344@wumpus.foo> On Sun, Nov 04, 2001 at 09:29:12PM +0100, Jakob ?stergaard wrote: > Problem is - mosix migrates jobs after a while. Initially a compiler > takes up a few megabytes of memory, but "after a while" it has grown > to hundreds of megabytes. When mosix decides to migrate the compiler > it will spend a long time on the netowrk to move the large process > image. I've never used Mosix. Does it have the ability to set policies like "this binary should always be immediately migrated at exec" or "all processes should be migrated at exec"? You'd think it would... and using such policies would solve this particular problem. greg From lindahl at conservativecomputer.com Thu Nov 8 07:59:30 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: Promise SX6000 vs Adaptec 2400A In-Reply-To: ; from Kian_Chang_Low@vdgc.com.sg on Thu, Nov 08, 2001 at 12:01:48AM +0800 Message-ID: <20011108105930.F12344@wumpus.foo> On Thu, Nov 08, 2001 at 12:01:48AM +0800, Kian_Chang_Low@vdgc.com.sg wrote: > With 3ware existing the IDE raid storage card market, I was looking for a > replacement for a cluster and came across the Promise Supertrak SX6000 and > Adaptec ATA RAID 2400A. If I understand these cards correctly, both are I2O cards. So you can use their proprietary drivers, or you can use Linux's I2O driver, and in both cases, you can put multiple controllers on one system. I haven't tried it personally, though. greg From raysonlogin at yahoo.com Thu Nov 8 08:14:22 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP Message-ID: <20011108161422.43393.qmail@web11403.mail.yahoo.com> I found something interesting from AMD's developer site: ...processor also features the advanced _MOESI_ cache coherency protocol to ensure efficient cache integrity in a multiprocessing environment. MOESI provides better performance in MP configurations, due to the added O (owned) state. In theory, this can provide better cache performance. Also, please read the section "Why No Dual-Processing Support with Thunderbird?" in the following web page: http://www.creativecow.net/articles/hawes_tyler/amd_mps/athlonmp_760mp_full.html Rayson References: AMD Athlon XP Processor Model 6 Data Sheet TM: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24309.pdf AMD Athlon MP Processor Model 6 Data Sheet Multiprocessor-Capable for Workstation and Server Platforms TM: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24685.pdf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From joelja at darkwing.uoregon.edu Thu Nov 8 07:47:17 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:51 2009 Subject: Multiple Promise Ultra 100 TX2 controllers... In-Reply-To: <3BEAA727.3F425C33@gats-inc.com> Message-ID: build a kernel and make sure that in the ide/ata section that: CONFIG_PDC202XX_BURST (the caption says somehting like "Special UDMA Feature") is enabled... that works around that bug in the ultra100/ultra66... there more info in pathtokernel/drivers/ide/pdc202xx.c joelja On Thu, 8 Nov 2001, John Burton wrote: > Greetings! > > Over the past month, I've been trying to build a 500GB ATA/100 RAID 5 > array and have encountered multiple problems along the way. My system > consists of: > > * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit > nic (eepro100) onboard. > * 2 1GHz PIII processors w/ 512MB memory. > * 9GB Quantum Atlas Ultra 160 SCSI system disk. > * 100GB Seagate AIT tape Autoloader. > * Seagate DDS-3 4mm tape drive. > * 6 x 100GB Western Digital ATA/100 disks > * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area > * RedHat 7.2 > > The latest problem I've been having is with multiple Promise Ultra 100 > tx2 controllers - with 6 disks, I need 6 IDE channels which means 3 > Ultra 100 controllers. I had purchased one tx2 earlier this year (early > spring) and just this past week purchased 2 more. I installed them and > connected them to the 6 disks. > > When I booted the machine, I got the Promise Ultra BIOS screen detecting > the drives, and then it displays a list of 8 possible drives (D0 - D7). > D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do > not have any disks listed (this is expected since I'm only using 1 > master drive per channel). What is not expected is that there are only 8 > possible drives listed. With 3 controllers, there should be 12 possible > drives with 6 drives detected. > > When Linux started booting, I noticed that all 3 controllers and 6 disks > were detected. So far so good. When the kernel started checking for > partitions on the disks, it ran into trouble (last two disks giving DMA > errors). Below is the appropriate log entries showing what happened. > According to the logs it looks like there is a problem with either the > 3rd controller or the last 2 disks. I rearranged the order of the > controllers (i.e. swapped which cards were installed in which slots) and > left the order of the disks the same (first two disks attached to the > controller in the first PCI slot, etc). And got the same results (last > two disks showing DMA errors). I then changed the order of the disks > relative to the PCI slots and still got the same results (last two disks > giving DMA errors). I then removed one controller at a time (leaving 2 > installed at any one time) and connected various combinations of 4 disks > from the available 6. Everything worked fine, with no errors. At this > point I'm kinda stuck with the conclusion that only 2 Promise Ultra100 > TX2 cards will work in that system at one time. > > Does anyone have any suggestions? thoughts? help? > > Hopefully waiting, > > John > > > SYSLOG Entries: > > Nov 7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00 > dev 20 > Nov 7 13:49:48 oracle kernel: PDC20268: chipset revision 1 > Nov 7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will > probe irqs later > Nov 7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaf8000 > Nov 7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit DISABLED > Primary PCI Mode Secondary MASTER Mode. > Nov 7 13:49:48 oracle kernel: ide2: BM-DMA at 0xdf90-0xdf97, BIOS > settings: hde:pio, hdf:pio > Nov 7 13:49:48 oracle kernel: ide3: BM-DMA at 0xdf98-0xdf9f, BIOS > settings: hdg:pio, hdh:pio > Nov 7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00 > dev 18 > Nov 7 13:49:48 oracle kernel: PDC20268: chipset revision 1 > Nov 7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will > probe irqs later > Nov 7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaec000 > Nov 7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED > Primary MASTER Mode Secondary MASTER Mode. > Nov 7 13:49:48 oracle kernel: ide4: BM-DMA at 0xdf60-0xdf67, BIOS > settings: hdi:pio, hdj:pio > Nov 7 13:49:48 oracle kernel: ide5: BM-DMA at 0xdf68-0xdf6f, BIOS > settings: hdk:pio, hdl:pio > Nov 7 13:49:49 oracle kernel: PDC20268: IDE controller on PCI bus 00 > dev 10 > Nov 7 13:49:49 oracle kernel: PDC20268: chipset revision 1 > Nov 7 13:49:49 oracle kernel: PDC20268: not 100%% native mode: will > probe irqs later > Nov 7 13:49:49 oracle kernel: PDC20268: ROM enabled at 0xfeae4000 > Nov 7 13:49:49 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED > Primary MASTER Mode Secondary MASTER Mode. > Nov 7 13:49:49 oracle kernel: ide6: BM-DMA at 0xdf30-0xdf37, BIOS > settings: hdm:pio, hdn:pio > Nov 7 13:49:49 oracle kernel: ide7: BM-DMA at 0xdf38-0xdf3f, BIOS > settings: hdo:pio, hdp:pio > Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: IDE controller on PCI > bus 00 dev 79 > Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: chipset revision 0 > Nov 7 13:49:49 oracle kernel: ServerWorks OSB4: not 100%% native mode: > will probe irqs later > Nov 7 13:49:49 oracle kernel: ide0: BM-DMA at 0xffa0-0xffa7, BIOS > settings: hda:pio, hdb:pio > Nov 7 13:49:49 oracle kernel: ide1: BM-DMA at 0xffa8-0xffaf, BIOS > settings: hdc:DMA, hdd:pio > Nov 7 13:49:49 oracle kernel: hdc: CD-ROM CDU311, ATAPI CD/DVD-ROM > drive > Nov 7 13:49:49 oracle kernel: hde: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: hdg: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: hdi: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: hdk: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: hdm: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: hdo: WDC WD1000BB-00CCB0, ATA DISK drive > Nov 7 13:49:49 oracle kernel: ide1 at 0x170-0x177,0x376 on irq 15 > Nov 7 13:49:49 oracle kernel: ide2 at 0xdff0-0xdff7,0xdfe6 on irq 22 > Nov 7 13:49:49 oracle kernel: ide3 at 0xdfa8-0xdfaf,0xdfe2 on irq 22 > Nov 7 13:49:49 oracle kernel: ide4 at 0xdfa0-0xdfa7,0xdf8e on irq 20 > Nov 7 13:49:49 oracle kernel: ide5 at 0xdf80-0xdf87,0xdf8a on irq 20 > Nov 7 13:49:49 oracle kernel: ide6 at 0xdf58-0xdf5f,0xdf7e on irq 18 > Nov 7 13:49:49 oracle kernel: ide7 at 0xdf50-0xdf57,0xdf4e on irq 18 > Nov 7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: hde: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, (U)DMA > Nov 7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: hdg: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, (U)DMA > Nov 7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: hdi: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, UDMA(100) > Nov 7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:49 oracle kernel: hdk: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, UDMA(100) > Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: hdm: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, UDMA(100) > Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: hdo: 195371568 sectors (100030 MB) > w/2048KiB Cache, CHS=193821/16/63, UDMA(100) > Nov 7 13:49:50 oracle kernel: ide-floppy driver 0.97.sv > Nov 7 13:49:50 oracle kernel: Partition check: > Nov 7 13:49:50 oracle kernel: hde: [PTBL] [12161/255/63] hde1 > Nov 7 13:49:50 oracle kernel: hdg: [PTBL] [12161/255/63] hdg1 > Nov 7 13:49:50 oracle kernel: hdi: [PTBL] [12161/255/63] hdi1 > Nov 7 13:49:50 oracle kernel: hdk: [PTBL] [12161/255/63] hdk1 > Nov 7 13:49:50 oracle kernel: hdm:hdm: dma_intr: status=0x51 { > DriveReady SeekComplete Error } > Nov 7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 { > DriveStatusError BadCRC } > Nov 7 13:49:50 oracle kernel: hdm: dma_intr: status=0x51 { DriveReady > SeekComplete Error } > Nov 7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 { > DriveStatusError BadCRC } > Nov 7 13:49:50 oracle kernel: hdm: timeout waiting for DMA > Nov 7 13:49:50 oracle kernel: ide_dmaproc: chipset supported > ide_dma_timeout func only: 14 > Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: [PTBL] [12161/255/63] hdm1 > Nov 7 13:49:50 oracle kernel: hdo:hdo: timeout waiting for DMA > Nov 7 13:49:50 oracle kernel: ide_dmaproc: chipset supported > ide_dma_timeout func only: 14 > Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb > (mask 0xffffffff) > Nov 7 13:49:50 oracle kernel: [PTBL] [12161/255/63] hdo1 > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From cblack at eragen.com Thu Nov 8 08:23:58 2001 From: cblack at eragen.com (Chris Black) Date: Wed Nov 25 01:01:51 2009 Subject: Multiple Promise Ultra 100 TX2 controllers... In-Reply-To: <3BEAA727.3F425C33@gats-inc.com>; from j.c.burton@gats-inc.com on Thu, Nov 08, 2001 at 10:39:19AM -0500 References: <3BEAA727.3F425C33@gats-inc.com> Message-ID: <20011108112358.A11047@getafix.EraGen.com> So, to summarize: two controllers work fine, three fail. Have you turned on CONFIG_PDC202XX_BURST in your kernel config? It is in the IDE/ATA section of the kernel config. It shows up in menuconfig as "Special UDMA feature" and the help text says: For PDC20246, PDC20262, PDC20265 and PDC20267 Ultra DMA chipsets. x x Designed originally for PDC20246/Ultra33 that has BIOS setup x x failures when using 3 or more cards. x x x x Unknown for PDC20265/PDC20267 Ultra DMA 100. x x x x Please read the comments at the top of drivers/ide/pdc202xx.c x x x x If unsure, say N. It sounds like it might help. Chris On Thu, Nov 08, 2001 at 10:39:19AM -0500, John Burton wrote: > Greetings! > > Over the past month, I've been trying to build a 500GB ATA/100 RAID 5 > array and have encountered multiple problems along the way. My system > consists of: > > * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit > nic (eepro100) onboard. > * 2 1GHz PIII processors w/ 512MB memory. > * 9GB Quantum Atlas Ultra 160 SCSI system disk. > * 100GB Seagate AIT tape Autoloader. > * Seagate DDS-3 4mm tape drive. > * 6 x 100GB Western Digital ATA/100 disks > * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area > * RedHat 7.2 > > The latest problem I've been having is with multiple Promise Ultra 100 > tx2 controllers - with 6 disks, I need 6 IDE channels which means 3 > Ultra 100 controllers. I had purchased one tx2 earlier this year (early > spring) and just this past week purchased 2 more. I installed them and > connected them to the 6 disks. > > When I booted the machine, I got the Promise Ultra BIOS screen detecting > the drives, and then it displays a list of 8 possible drives (D0 - D7). > D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do > not have any disks listed (this is expected since I'm only using 1 > master drive per channel). What is not expected is that there are only 8 > possible drives listed. With 3 controllers, there should be 12 possible > drives with 6 drives detected. > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011108/332f8596/attachment.bin From math at velocet.ca Thu Nov 8 09:40:29 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <20011108161422.43393.qmail@web11403.mail.yahoo.com>; from raysonlogin@yahoo.com on Thu, Nov 08, 2001 at 08:14:22AM -0800 References: <20011108161422.43393.qmail@web11403.mail.yahoo.com> Message-ID: <20011108124029.T32202@velocet.ca> On Thu, Nov 08, 2001 at 08:14:22AM -0800, Rayson Ho's all... > I found something interesting from AMD's developer site: > > ...processor also features the advanced _MOESI_ cache coherency > protocol to ensure efficient cache integrity in a multiprocessing > environment. > > MOESI provides better performance in MP configurations, due to the > added O (owned) state. In theory, this can provide better cache > performance. What packages support MOESI and 3DNow!Professional? When will they? /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From raysonlogin at yahoo.com Thu Nov 8 10:20:36 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <20011108124029.T32202@velocet.ca> Message-ID: <20011108182036.64364.qmail@web11403.mail.yahoo.com> MOESI is a cache protocol, you don't need new software/compiler support, all you need is the hardware chipset. 3DNow! Professional is actually Intel's SSE, which Intel's compiler can generate vectorized code to take advantage of. Rayson --- Velocet wrote: > What packages support MOESI and 3DNow!Professional? When will they? > > /kc > -- > Ken Chase, math@velocet.ca * Velocet Communications Inc. * > Toronto, CANADA > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From math at velocet.ca Thu Nov 8 11:28:40 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <20011108182036.64364.qmail@web11403.mail.yahoo.com>; from raysonlogin@yahoo.com on Thu, Nov 08, 2001 at 10:20:36AM -0800 References: <20011108124029.T32202@velocet.ca> <20011108182036.64364.qmail@web11403.mail.yahoo.com> Message-ID: <20011108142839.X32202@velocet.ca> On Thu, Nov 08, 2001 at 10:20:36AM -0800, Rayson Ho's all... > MOESI is a cache protocol, you don't need new software/compiler > support, all you need is the hardware chipset. From lindahl at conservativecomputer.com Thu Nov 8 11:48:52 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <20011108142839.X32202@velocet.ca>; from math@velocet.ca on Thu, Nov 08, 2001 at 02:28:40PM -0500 References: <20011108124029.T32202@velocet.ca> <20011108182036.64364.qmail@web11403.mail.yahoo.com> <20011108142839.X32202@velocet.ca> Message-ID: <20011108144852.A12948@wumpus.foo> On Thu, Nov 08, 2001 at 02:28:40PM -0500, Velocet wrote: > From what I understood from the useful articles that were > posted here, the cache protocol allows sharing data between the CPUs > via the northbridge directly. Right. What it comes down to is this: Getting data from L2 is always fastest if its in your own L2. But if it isn't, some machines fetch from main memory faster than they can fetch a dirty line from someone else's L2. AMD's scheme has reasonably fast main memory fetches, plus even more efficient fetches from a remote L2. I believe the Sun E10k is one of the few processors where main memory is closer than someone else's L2. That makes false sharing even worse than usual. However, from the beowulf standpoint, most of us are running 2 independent mpi proceses on dual cpu boxes, right? g From tlovie at pokey.mine.nu Thu Nov 8 12:36:08 2001 From: tlovie at pokey.mine.nu (Thomas Lovie) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <20011108144852.A12948@wumpus.foo> Message-ID: <000e01c16894$ff3dc010$1106a8c0@sneezy> > On Thu, Nov 08, 2001 at 02:28:40PM -0500, Velocet wrote: > > > From what I understood from the useful articles that were > posted here, > > the cache protocol allows sharing data between the CPUs via the > > northbridge directly. > > Right. What it comes down to is this: Getting data from L2 is > always fastest if its in your own L2. But if it isn't, some > machines fetch from main memory faster than they can fetch a > dirty line from someone else's L2. AMD's scheme has > reasonably fast main memory fetches, plus even more efficient > fetches from a remote L2. > > I believe the Sun E10k is one of the few processors where > main memory is closer than someone else's L2. That makes > false sharing even worse than usual. > > However, from the beowulf standpoint, most of us are running > 2 independent mpi proceses on dual cpu boxes, right? I have a innocent question.... Does the kernel have processor affinity built in to it yet? The situation may arise that one of the mpi processes gets bumped from it's processor, by a system task, then it in turn bumps the other mpi task from the other processor, and in effect, it's info is cached in the other processor. I see the advantage that fetching from a remote L2 is better, but does anybody know the status of assigning a processor affinity mask to processes? Tom Lovie. From lindahl at conservativecomputer.com Thu Nov 8 12:55:30 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy>; from tlovie@pokey.mine.nu on Thu, Nov 08, 2001 at 03:36:08PM -0500 References: <20011108144852.A12948@wumpus.foo> <000e01c16894$ff3dc010$1106a8c0@sneezy> Message-ID: <20011108155530.A13104@wumpus.foo> On Thu, Nov 08, 2001 at 03:36:08PM -0500, Thomas Lovie wrote: > I have a innocent question.... Does the kernel have processor affinity > built in to it yet? Yes, and has for ages. Use the source, Luke: /usr/src/linux/kernel/sched.c, function goodness. This is 2.4: #ifdef CONFIG_SMP /* Give a largish advantage to the same processor... */ /* (this is equivalent to penalizing other processors) */ if (p->processor == this_cpu) weight += PROC_CHANGE_PENALTY; #endif g From hahn at physics.mcmaster.ca Thu Nov 8 13:10:17 2001 From: hahn at physics.mcmaster.ca (Mark Hahn) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy> Message-ID: > I have a innocent question.... Does the kernel have processor affinity > built in to it yet? yes, it has for years (even 2.2). in the mainline kernel, the affinity is just "be reluctant to move processes", but there's a patch (pset) if you really think you can do better manually. From raysonlogin at yahoo.com Thu Nov 8 13:23:13 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:51 2009 Subject: Athlon MP vs Athlon XP In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy> Message-ID: <20011108212313.26013.qmail@web11401.mail.yahoo.com> --- Thomas Lovie wrote: > I have a innocent question.... Does the kernel have processor > affinity > built in to it yet? For Linux kernel, yes. For Solaris, I don't know. Didn't have time to sign the NDA in order to access Solaris source :-) Rayson > The situation may arise that one of the mpi > processes gets bumped from it's processor, by a system task, then it > in > turn bumps the other mpi task from the other processor, and in > effect, > it's info is cached in the other processor. I see the advantage that > fetching from a remote L2 is better, but does anybody know the status > of > assigning a processor affinity mask to processes? > > Tom Lovie. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From cfernandes at elo.com.br Thu Nov 8 17:33:01 2001 From: cfernandes at elo.com.br (Claudio Fernandes) Date: Wed Nov 25 01:01:51 2009 Subject: Performance of pararallel programs Message-ID: <01110823330104.01108@master> hello, I would like to know about any tools to mesure performance of parallel programs over mpich in a scyld beowulf cluster . I'm looking for any trace library that keeps a record of a program's MPI calls Thank you. Claudio Fernandes UNIVERSIDADE FEDERAL DO RN (UFRN) BRAZIL From jlong at arsc.edu Thu Nov 8 17:58:15 2001 From: jlong at arsc.edu (James Long) Date: Wed Nov 25 01:01:51 2009 Subject: Performance of pararallel programs In-Reply-To: <01110823330104.01108@master> References: <01110823330104.01108@master> Message-ID: http://www.pallas.de/pages/products.htm At 11:33 PM -0200 11/8/01, Claudio Fernandes wrote: >hello, > > I would like to know about any tools to mesure performance of parallel >programs over mpich in a scyld beowulf cluster . I'm looking for >any trace library that keeps a record of a program's MPI calls > > > Thank you. > > Claudio Fernandes > UNIVERSIDADE FEDERAL DO RN (UFRN) > BRAZIL > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% James Long MPP Specialist Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks, AK 99775-6020 jlong@arsc.edu (907) 474-5731 work (907) 474-5494 fax %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% From patrick at myri.com Thu Nov 8 15:13:51 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:51 2009 Subject: Performance of pararallel programs References: <01110823330104.01108@master> Message-ID: <3BEB11AF.C72B2EC7@myri.com> Claudio Fernandes wrote: > I would like to know about any tools to mesure performance of parallel > programs over mpich in a scyld beowulf cluster . I'm looking for > any trace library that keeps a record of a program's MPI calls Jumpshot, included in MPICH. It's free and it works. I have been able to process trace files as large as 2 GBs. Patrick ---------------------------------------------------------- | Patrick Geoffray, Ph.D. patrick@myri.com | Myricom, Inc. http://www.myri.com | Cell: 865-389-8852 685 Emory Valley Rd (B) | Phone: 865-425-0978 Oak Ridge, TN 37830 ---------------------------------------------------------- From ron_chen_123 at yahoo.com Thu Nov 8 19:44:09 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:51 2009 Subject: Fwd: Grid Engine Training at SC2001 Message-ID: <20011109034409.32545.qmail@web14702.mail.yahoo.com> Beowulf and PBS users/developers, There will be a training session at SC2001. What is included is an intro the API initiative, which provides a standard for PBS and SGE. In the near future, we can have SGE/PBS sharing components! See the email below for details. -Ron --- Conrad Geiger wrote: > For those that are attending SC2001, there is a free > Grid Engine (SGE) training session available. > If you are interested in this open source Beowulf > job > management system and would like to attend, please > email > me and show up at the Denver location and time > listed below: > > Class: SGE (Grid Engine) training > Date: Monday, November 12 > Time: 1:00 p.m. - 4:00 p.m. > Classroom location: Colorado Ballroom F > Marriott Hotel, 1701 California Street, > Denver > (near Denver Convention Center) > > AGENDA > GRID ENGINE (SGE) TECHNICAL PRESENTATION: > > Sun Grid Engine (1 hour) > -- overview of concepts > -- installation options > -- architecture > -- information flow > -- scheduling > -- complexes and resource > management > -- parallel and checkpointing > > Examples (30 minutes) > -- complexes > -- load sensor > -- license management > -- immediate vs. low priority jobs > > SGE/EE technology (15 minutes) > -- tickets > -- share tree, functional, > deadline, override > > Grid Engine Integration with ClusterTools > (20 minutes) > > Grid Engine Open Source Project and API initiative > (20 minutes) > > For registration please reply to: > > Conrad.Geiger@Sun.COM > __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From ron_chen_123 at yahoo.com Thu Nov 8 19:46:33 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:51 2009 Subject: One more SGE event @ SuperComputing 2001 Message-ID: <20011109034633.32861.qmail@web14702.mail.yahoo.com> --John Tollefsrud wrote: > > If you are attending SC2001 (www.sc2001.org) > November 10 - 16, you may be > interested in the following events (if you live in > Denver and would like to > come by the event, drop me a note and I'll get you > in). Sorry for the > marketing-like blurb, but these are technical > presentations and I thought > they may be of interest. > > Thanks, > > jt > > j.t@sun.com > > -------- > > Grid Engine open source BOF > This Birds of a Feather will feature a live walk > through demonstration of > Grid Engine (for newbies) and some review of the > Grid Engine Enterprise > Edition features, and other discussion items of > interest to attendees. This > is part of the Conference Technical Program agenda > at SC2001. > > When: November 15, 2001 > 8:30 - 10am > > Where: Denver Convention Center > > > Grid Computing technical talk > This is a Sun Microsystems sponsored event, designed > to provide a good > exposure to the technical thought leaders and key > topics in Grid Computing. > Moderated by IDC Research Vice President Chris > Willard, the speakers > include: > > Mary Thomas, San Diego Supercomputing Center, on > Grid Access > Keith Gray, BP, on Cluster Grids > Craig Stair, Raytheon, on Campus Grids > Ian Foster, Globus, on Global Grids > Andrew Grimshaw, Avaki, on Global Grids > Ed Seidel, Cactus Project, on Grid Application > Frameworks > Wolfgang Gentzsch, Sun Microsystems, on Sun Grid > Computing > > When: November 14, 2001 > 8:00am - 8:30am Continental Breakfast > 8:30am - 9:30am Presentations > 9:30am - 9:45am Q & A > > Where: Denver Athletic Club > 1325 Glenarm Place > 1 block from the Denver Convention > Center > > > __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From ds10025 at cam.ac.uk Thu Nov 8 23:04:25 2001 From: ds10025 at cam.ac.uk (test) Date: Wed Nov 25 01:01:51 2009 Subject: Using NFS with Scyld (-7 ver.) References: <20011108105736.B12344@wumpus.foo> Message-ID: <002801c168ec$c57f7640$0301a8c0@cam.ac.uk> Good morning Where can I dowload a free copy of scyld? Dan ----- Original Message ----- From: "Greg Lindahl" To: "Beowulf List" Sent: Thursday, November 08, 2001 3:57 PM Subject: Re: Using NFS with Scyld (-7 ver.) > On Mon, Nov 05, 2001 at 02:03:06PM -0500, Donald Becker wrote: > > > It does work: I wrote the original user-level NFS server (unfsd) used by > > Linux, and re-exporting was one of the primary advantages over the Sun > > implementation. > > Not only does re-exporting work, it works well enough that the CPlant > people use it for their single system disk, which is shared by more > than 1,000 nodes. There's 1 node per rack which mounts the one disk > and re-exports it to the other nodes in the rack. The extra caching is > very important to them. > > greg > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From lindahl at conservativecomputer.com Fri Nov 9 01:33:12 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:51 2009 Subject: RDRAM vs SDRAM redux In-Reply-To: ; from rgb@phy.duke.edu on Wed, Nov 07, 2001 at 11:26:39AM -0500 References: Message-ID: <20011109043312.A14638@wumpus.foo> On Wed, Nov 07, 2001 at 11:26:39AM -0500, Robert G. Brown wrote: > Life continues to get more puzzling all the time. We are working out > final configurations for a mixed purchase of P4's and Athlon XP's. Or > so I thought when I started to review the hardware alternatives this > morning. I'm basically getting ready to update a quote from three months > ago but the world has of course changed substantially in the meantime. What you're lacking is a good understanding of your code, cpu vs. memory bandwidth. One way to explore that is to play with your BIOS settings to artifically lower your STREAM bandwidth. Run your code and STREAM both ways, see what happens. > The Athlon update was fairly easy. It looks like the KT266A chipset is > probably the one of choice for a single CPU solution If you don't mind terrible PCI performance, yes. > The P4's are much more difficult because there are now SDRAM chipsets. Right. And if you knew your code's dependence on stream, you'd be all set. > There is also no clear indication on whether using > an SSE compiler with the XP makes a difference -- does the XP support > SSE1 and/or SSE2 instructions? If I recall correctly, the XP has SSE1, and the MP has SSE2, or at least more than the XP has. greg From xyzzy at speakeasy.org Fri Nov 9 02:15:24 2001 From: xyzzy at speakeasy.org (Trent Piepho) Date: Wed Nov 25 01:01:52 2009 Subject: RDRAM vs SDRAM redux In-Reply-To: <20011109043312.A14638@wumpus.foo> Message-ID: On Fri, 9 Nov 2001, Greg Lindahl wrote: > If I recall correctly, the XP has SSE1, and the MP has SSE2, or at > least more than the XP has. I'm pretty sure that neither the MP nor XP has SSE2 support. The athlon thunderbird has MMX but not SSE1 support, then the MP and now the XP added SSE1. From gropp at mcs.anl.gov Fri Nov 9 05:59:33 2001 From: gropp at mcs.anl.gov (William Gropp) Date: Wed Nov 25 01:01:52 2009 Subject: Performance of pararallel programs In-Reply-To: <01110823330104.01108@master> Message-ID: <5.1.0.14.2.20011109074742.04817b40@localhost> At 11:33 PM 11/8/2001 -0200, Claudio Fernandes wrote: >hello, > > I would like to know about any tools to mesure performance of > parallel >programs over mpich in a scyld beowulf cluster . I'm looking for >any trace library that keeps a record of a program's MPI calls MPICH comes with several such libraries in the mpe directory. If you are using the compilation scripts that come with MPICH, you can simply relink with -mpilog. The Jumpshot program provides a graphical display for the data. See http://www.mcs.anl.gov/perfvis/ for more information. Bill From rlatham at plogic.com Fri Nov 9 07:20:14 2001 From: rlatham at plogic.com (Rob Latham) Date: Wed Nov 25 01:01:52 2009 Subject: Scyld iso image In-Reply-To: <3BEA72D5.FA062766@fsagx.ac.be>; from leunen.d@fsagx.ac.be on Thu, Nov 08, 2001 at 12:56:05PM +0100 References: <3BEA72D5.FA062766@fsagx.ac.be> Message-ID: <20011109102014.T27969@otto.plogic.internal> On Thu, Nov 08, 2001 at 12:56:05PM +0100, David Leunen wrote: > Hello, > > Does anyone of you know a ftp site where I can found the iso image of > the latest scyld? I really can't wait the CD from linux central (and it > is an older version). gee, that's funny...i bought a couple scyld cd's from linux central not 3 weeks ago. took all of 3 days for them to come, and if i needed them "yesterday", i could have paid for the fast shipping. it's the "label side up" edition, 27BZ-8 ( january's linuxworld release was 27BZ-7). search the MARC archives ( http://marc.theaimsgroup.com/?l=beowulf&r=1&w=2 ) if you want to see the "free scyld iso download" discussion. ==rob -- [ Rob Latham Developer, Admin, Alchemist ] [ Paralogic Inc. - www.plogic.com ] [ ] [ EAE8 DE90 85BB 526F 3181 1FCF 51C4 B6CB 08CC 0897 ] From raysonlogin at yahoo.com Fri Nov 9 08:48:47 2001 From: raysonlogin at yahoo.com (Rayson Ho) Date: Wed Nov 25 01:01:52 2009 Subject: Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and management Message-ID: <20011109164847.49991.qmail@web11402.mail.yahoo.com> FYI, Rayson --- "Brian J. Watson" wrote: > In an SSI cluster, it should only be necessary to install software > on a single node. Most other nodes can be thin clients, using > Etherboot or PXE to load their kernel and ramdisk from the > CLMS master. A potential CLMS master node needs to have its kernel > and ramdisk stored locally on a SCSI or IDE disk, in case it's > the first node booted in the cluster. Even a potential CLMS master, > however, can initially get its kernel and ramdisk via Etherboot/PXE > and install them onto its hard disk with minimal sysadmin > involvement. > > Etherboot is an open-source software package for creating ROM images > that allow a computer to boot off the network using DHCP or BOOTP. > For those who cannot or will not flash their ROM with one of these > images, Etherboot includes a special boot block for loading the image > from a floppy or hard drive. Etherboot appears to support about > a hundred different NIC models. Unfortunately, it only supports > the x86 platform right now. > > For more information, visit the Etherboot website: > http://etherboot.sourceforge.net/ > > PXE (Preboot Execution Environment) is an Intel specification for > doing pretty much the same thing. An advantage is that PXE images > come pre-loaded on certain NICs, but I suspect most PXE images are > closed source. > > To read Intel's PXE spec: > ftp://download.intel.com/ial/wfm/pxespec.pdf > > To support this new dependent node booting model, changes to initial > node installation would include: > - Making sure dhcpd and tftpd are installed as part of the base > Linux distribution. > - Installing mknbi (part of Etherboot) on the shared root for > building a tagged image of the kernel and ramdisk. > - Adding an /etc/ssitab file for specifying the MAC address, > IP address, node number, and local boot flag for each node > allowed to join the cluster. For each node with the local boot > flag set, a device for the boot partition must also be specified. > The local boot flag should only be set for potential CLMS master > nodes on the x86 platform. For platforms not supported by > Etherboot/PXE, such as Alpha, _all_ nodes should have the local > boot flag set. > - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab. > - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf > from the data in /etc/ssitab. To support non-SSI uses of DHCP, > it copies anything it finds in /etc/dhcpd.proto before appending > the generated lines. > - Installing a new lilo.ssi command that does the following: > * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and > lilo > to sync the default kernel and ramdisk out to all potential > nodes that are up with the local boot flag set > * runs mknbi to generate a tagged image of the default kernel > and ramdisk in /tftpboot/, so that dependent nodes can > download it while booting > > In addition, changes will have to be made to the ramdisk, which means > changes to the mkinitrd.ssi script: > - Copy /etc/ssitab into the ramdisk. > - Enhance /linuxrc to match a local MAC address to an entry in > /etc/ssitab to determine the local IP address and node number. > - If the local boot flag is set, then /linuxrc compares the default > kernel and ramdisk on the shared root to those on the local disk. > > If they differ, it runs lilo.ssi with a special flag to just sync > the local disk. > - The hack in VI.3 of the installation instructions will go away. > Dave Zafman and I cooked up a scheme for /linuxrc to read > /proc/partitions and make all the devices it finds there. > That removes the need for the sysadmin to figure out the local > device names of the two GFS partitions. > - As well as building the ramdisk, mkinitrd.ssi also runs > mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab. > > Adding new nodes -- this is the beautiful part: > - Make sure there are enough available journals for the new nodes > on the GFS shared root. Note that the Cluster Filesystem (CFS) > that Dave is porting doesn't have this requirement, which makes > it better suited for large clusters. > - Edit /etc/ssitab to add records for each new node. The MAC > address can be determined by booting the new node with an > Etherboot floppy or ROM image. Although the DHCP server will > not respond to this unknown MAC address just yet, the node will > display on its console the MAC address of the card it discovered. > - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf. > - Run lilo.ssi to distribute the new ramdisk to all nodes that are > up with the local boot flag set, and to rebuild the tagged image > in /tftpboot/. > - If a new node does not have the local boot flag set, just boot it > with the appropriate Etherboot/PXE ROM image or floppy. Like > magic, > it'll join the cluster. > - If the local boot flag is set, and the platform is x86, boot it > with the ROM image or floppy. While running /linuxrc, it'll sync > the local disk if the boot partition has already been created. > - If the boot partition has not been created, /linuxrc will proceed > with joining the cluster. Once it has joined, run fdisk and mkfs > to set up the boot partition. Then reboot the node one more time > with the ROM image or floppy, so it can sync the local disk the > next time it joins. > - On a platform that does not support Etherboot/PXE, the PITA > factor > is a bit higher for adding a new node (which must have the > local boot flag set). To avoid needless installation of the base > OS, try booting off a distribution CD into rescue mode. Use fdisk > and mkfs to set up the boot partition. Mount it. Either use a > floppy or set up networking to copy the default kernel and > ramdisk > from the cluster to the boot partition. Also, copy the > appropriate > stanza for your bootloader (e.g., aboot), and run it to install > the boot block. Now it's ready to join the cluster. Finally, > consider adding support for your platform to Etherboot or an > equivalent software package. > > Some weaknesses in this proposal are support for non-x86 platforms, > to which I've given some thought, and support for User Mode Linux, > to which I've given very little thought. There are probably other > weaknesses, but overall I think this improves the installation and > management of OpenSSI on the x86 platform. > > Suggestions are definitely welcome, especially since I haven't > started the implementation, yet. ;) > > -- > Brian Watson | "Now I don't know, but I been told it's > Linux Kernel Developer | hard to run with the weight of gold, > Open SSI Clustering Project | Other hand I heard it said, it's > Compaq Computer Corp | just as hard with the weight of lead." > Los Angeles, CA | -Robert Hunter, 1970 > > mailto:Brian.J.Watson@compaq.com > http://opensource.compaq.com/ > > _______________________________________________ > ssic-linux-devel mailing list > ssic-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com From wsb at paralleldata.com Fri Nov 9 16:49:05 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:52 2009 Subject: Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and management References: <20011109164847.49991.qmail@web11402.mail.yahoo.com> Message-ID: <3BEC7981.1BF493A5@paralleldata.com> Rayson Ho wrote: > > FYI, > > Rayson > > --- "Brian J. Watson" wrote: > > - Adding an /etc/ssitab file for specifying the MAC address, > > IP address, node number, and local boot flag for each node > > allowed to join the cluster. For each node with the local boot > > flag set, a device for the boot partition must also be specified. > > The local boot flag should only be set for potential CLMS master > > nodes on the x86 platform. For platforms not supported by > > Etherboot/PXE, such as Alpha, _all_ nodes should have the local > > boot flag set. Not sure what this guy is thinking but Alphas boot just fine with bootp on SRM supported network cards. That's what Etherboot does for x86 boxes. I doubt Alpha does PXE though. Pretty much all non-x86 UNIX boxes I've used can netboot using bootp. Linuxcentral sells x86 bootp capable network cards for those that are interested in that sort of thing. The cards use etherboot. Also, most new mobo's with built-in Enet support PXE from what I've seen. Wes From gkogan at students.uiuc.edu Sat Nov 10 22:31:27 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: <20011106142455.C19207@kotako.analogself.com> Message-ID: I brought some of my slave nodes up. But in the Memory section for some of them it says 181/251MB (72%). Does this mean that 181 MB of memory are being used for something or that 181MB are free? Thanks From edwards at icantbelieveimdoingthis.com Sat Nov 10 22:25:47 2001 From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834) Date: Wed Nov 25 01:01:52 2009 Subject: Scyld iso image References: <3BEA72D5.FA062766@fsagx.ac.be> Message-ID: <3BEE19EB.5020504@icantbelieveimdoingthis.com> David Leunen wrote: >Hello, > >Does anyone of you know a ftp site where I can found the iso image of >the latest scyld? I really can't wait the CD from linux central (and it >is an older version). > >I have a pretty fast connection and it shouldn't be long. I will very >much appreciate if you provide it for me. You can answer me to my >personal e-mail or on this mail-list. > >Thank you. > >David >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > >. > I just got the cd from Linux central for 9.95 total. It took two days and says it is z-8. Art Edwards From agrajag at scyld.com Sun Nov 11 20:35:52 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: ; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 12:31:27AM -0600 References: <20011106142455.C19207@kotako.analogself.com> Message-ID: <20011111233552.A6888@blueraja.scyld.com> On Sun, 11 Nov 2001, german kogan wrote: > > I brought some of my slave nodes up. But in the Memory section for some of > them it says 181/251MB (72%). Does this mean that 181 MB of memory are > being used for something or that 181MB are free? It means that 181M out of 251M are used, and that's approximately 72% of the RAM. When looking at this number, its important to remember that the 181M is the RAM being used by processes on the system, as well as any memory the kernel is using for buffers and cache (such as it uses with filesystems to speed up repeated accesses). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011111/18f84b46/attachment.bin From gkogan at students.uiuc.edu Sun Nov 11 20:48:41 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: <20011111233552.A6888@blueraja.scyld.com> Message-ID: On Sun, 11 Nov 2001, Sean Dilda wrote: > On Sun, 11 Nov 2001, german kogan wrote: > > > > > I brought some of my slave nodes up. But in the Memory section for some of > > them it says 181/251MB (72%). Does this mean that 181 MB of memory are > > being used for something or that 181MB are free? > > It means that 181M out of 251M are used, and that's approximately 72% of > the RAM. When looking at this number, its important to remember that > the 181M is the RAM being used by processes on the system, as well as > any memory the kernel is using for buffers and cache (such as it uses > with filesystems to speed up repeated accesses). > Thanks. But it seems that too much RAM is being ussed up. All I have done was boot up the slave nodes, and have not run anything on them. Or is this normal? Also, another question is about mpi. I have ran a simple test code on the cluster, and some processes seem to run on the master node. What do I have to do to prevent this from happening? So that the processes only run on the slave nodes. Thanks From agrajag at scyld.com Sun Nov 11 21:06:15 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: ; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 10:48:41PM -0600 References: <20011111233552.A6888@blueraja.scyld.com> Message-ID: <20011112000615.B6888@blueraja.scyld.com> On Sun, 11 Nov 2001, german kogan wrote: > > It means that 181M out of 251M are used, and that's approximately 72% of > > the RAM. When looking at this number, its important to remember that > > the 181M is the RAM being used by processes on the system, as well as > > any memory the kernel is using for buffers and cache (such as it uses > > with filesystems to speed up repeated accesses). > > > > Thanks. > But it seems that too much RAM is being ussed up. All I have done was boot > up the slave nodes, and have not run anything on them. Or is this normal? See what I wrote before. That number includes memory the kernel might be using for buffers and cache. You might also want to try doing 'bpsh free' to see a breakdown of how the memory on the slave node is being used. > > Also, another question is about mpi. I have ran a simple test code on > the cluster, and some processes seem to run on the master node. What do I > have to do to prevent this from happening? So that the processes only run on > the slave nodes. I'm assuming you're using -8. When running your MPI job, set NO_LOCAL=1 just like you set the NP -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011112/72dd0484/attachment.bin From gkogan at students.uiuc.edu Sun Nov 11 21:19:34 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: <20011112000615.B6888@blueraja.scyld.com> Message-ID: > > > > Also, another question is about mpi. I have ran a simple test code on > > the cluster, and some processes seem to run on the master node. What do I > > have to do to prevent this from happening? So that the processes only run on > > the slave nodes. > > I'm assuming you're using -8. When running your MPI job, set NO_LOCAL=1 > just like you set the NP > What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set this every time I ran mpi? The command I use for running mpi is 'mpi -np "number of processes" ./a.out'. Where would I put NO_LOCAL=1? THanks From agrajag at scyld.com Sun Nov 11 21:41:54 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: ; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 11:19:34PM -0600 References: <20011112000615.B6888@blueraja.scyld.com> Message-ID: <20011112004154.C6888@blueraja.scyld.com> On Sun, 11 Nov 2001, german kogan wrote: > What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set > this every time I ran mpi? The command I use for running > mpi is 'mpi -np "number of processes" ./a.out'. Where would I put > NO_LOCAL=1? -8 as in 27az-8, 27bz-8 or 27cz-8. I had thought that you were starting your jobs by doing: NP=4 ./a.out in which case you'd do: NP=4 NO_LOCAL=1 ./a.out (you can replace '4' with however many processes you actually want) As you're using mpirun, you can also do: mpirun -np "number of processes" -nolocal ./a.out -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011112/6973f016/attachment.bin From gkogan at students.uiuc.edu Sun Nov 11 23:26:22 2001 From: gkogan at students.uiuc.edu (german kogan) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: <20011112004154.C6888@blueraja.scyld.com> Message-ID: On Mon, 12 Nov 2001, Sean Dilda wrote: > On Sun, 11 Nov 2001, german kogan wrote: > > > What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set > > this every time I ran mpi? The command I use for running > > mpi is 'mpi -np "number of processes" ./a.out'. Where would I put > > NO_LOCAL=1? > > -8 as in 27az-8, 27bz-8 or 27cz-8. > > I had thought that you were starting your jobs by doing: > NP=4 ./a.out > in which case you'd do: > NP=4 NO_LOCAL=1 ./a.out > > (you can replace '4' with however many processes you actually want) > > As you're using mpirun, you can also do: > mpirun -np "number of processes" -nolocal ./a.out > For some reason I have two copies of mpirun one in /usr/bin/ and one in /usr/mpi_beowulf/bin. But when I try running some code with the copy in /usr/mpi_beowulf/bin I get the following error "p0_4360: p4_error: net_create_slave: host not a bproc node: -3 p4_error: latest msg from perror: Success" but it does have all the mpi options such as -nonlocal etc, it shows me all these when I type something like "mpirun -h". However I can run code with the mpirun from /usr/bin. But when I tried doing /usr/bin.mpirun -np 4 -nolocal ./a.out I get the folowing error "Failed to exec target program: No such file or directory". Do you have any ideas? Thanks From aby_sinha at yahoo.com Mon Nov 12 01:54:28 2001 From: aby_sinha at yahoo.com (Abhishek Sinha) Date: Wed Nov 25 01:01:52 2009 Subject: vi architecture Message-ID: <3BD5F389001E772D@mail.san.yahoo.com> (added by postmaster@mail.san.yahoo.com) Hello I had a look at Virtual Interface architecure and the idea of User Level networking seems good. but i m in doubt whether can use for commercial purposes or not. Might seem like a newbie question but if Vi does what it promises what could be the disadvantages of using it. Anyone having experience with it ..Please enlighten !! Regards Abhishek Sinha From jakob at unthought.net Mon Nov 12 02:47:33 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:52 2009 Subject: Compile farm? In-Reply-To: <20011108105814.C12344@wumpus.foo>; from lindahl@conservativecomputer.com on Thu, Nov 08, 2001 at 10:58:14AM -0500 References: <20011104212912.W14001@unthought.net> <20011108105814.C12344@wumpus.foo> Message-ID: <20011112114733.B30421@unthought.net> On Thu, Nov 08, 2001 at 10:58:14AM -0500, Greg Lindahl wrote: > On Sun, Nov 04, 2001 at 09:29:12PM +0100, Jakob ?stergaard wrote: > > > Problem is - mosix migrates jobs after a while. Initially a compiler > > takes up a few megabytes of memory, but "after a while" it has grown > > to hundreds of megabytes. When mosix decides to migrate the compiler > > it will spend a long time on the netowrk to move the large process > > image. > > I've never used Mosix. Does it have the ability to set policies like > "this binary should always be immediately migrated at exec" or "all > processes should be migrated at exec"? You'd think it would... and > using such policies would solve this particular problem. Sorry for the lag :) I don't know if you can set "migrate on exec" - I didn't experiment that much with it. I did try to tell it to migrate as early as possible, but couldn't make it do so to satisfaction... But much has changed since then I'm sure. I don't know if the early migration options allow for migrate-on-exec - there would be some fundamental problems with that too. Mosix considers the CPU/memory requirements of the process and migrates to the host "best suited at that time". Mosix would have to know about gcc, and know that it should migrate early, or (almost) never migrate. I don't know how it would look today. The other problem with Mosix is, that it requires you to run the same kernel on all machines. Now parallel compiles are usually done on a fairly homogenous cluster, but in my situation it's not really an option to run the same kernel revision on all machines. I just keep my headers, libraries and tools homogenous. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From Lechner at drs-esg.com Mon Nov 12 07:33:54 2001 From: Lechner at drs-esg.com (Lechner, David) Date: Wed Nov 25 01:01:52 2009 Subject: Good network traffic visualization tools ? Message-ID: I am investigating the performance of a multi-component software program on a cluster wrt various HW and network configurations - Can anyone suggest good tools to help monitor network utilization? I understand that SNMP enabled switches allow doing this, but I am using generic products now not branded products with lots of software support. So far I have looked at KSniffer, Cricket, Cheops, IPTraf, and NTop - I am using a mix of programs and need to measure capability of distributed programs that use direct sockets without MPI - we'd like to see the traffic monitored "real-time" via some color-coded matrix screen that shows traffic BW between nodes - even if it is just a table of values for all traffic within a cluster though then that would be good enough for now - we will snap that data into a vis. tool (an approach common to many of the tools I mention above). I also need Windows support as well as Linux - Thanks in advance - Dave Lechner. From akostocker at hotmail.com Mon Nov 12 09:24:13 2001 From: akostocker at hotmail.com (Tony Stocker) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld Message-ID: Hi there, I'm also seeing this "issue" but on a slightly larger scale. All of my slave nodes (currently at 6) have 1GB of memory yet in status monitor they all show 670MB used. And the cluster isn't doing anything at all, it's just booted up. What buffers and cache need to eat up this much of the memory? And more importantly isn't this going to affect performance when I actually do try to run something on the cluster? -Tony _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp From mlrecv at yahoo.com Mon Nov 12 09:43:23 2001 From: mlrecv at yahoo.com (Zhifeng F. Chen) Date: Wed Nov 25 01:01:52 2009 Subject: M-VIA 1.2 problems Message-ID: <00cc01c16ba1$87af16b0$906a7080@divine> Hi, Does anyone has experience with M-VIA 1.2b2? I am trying to compile and install M-VIA 1.2b2 on a RedHat 7.2, 2.4.10, SMP, 1G memory system, a Intel Eepro100 NIC. When I compile the source, some errors come up and I fixed by following the FAQ (change gcc to kgcc; add #undef min in front of #define min in vipk_core/vipk_rmm.h, remove { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82820FW_4,PCI_ANY_ID, PCI_ANY_ID, }, from eepro100.c). The compilation was successful. After I installed the device drivers (via_lo.o, via_eth0.o), the /dev/*, modprobe via_lo via_eth0), I tried to test them by using vnettest /dev/via_lo r localhost, and vnettest /dev/via_lo s localhost on the same machine. The machine becomes dead completely. Anyone has idea or comments of what is happening? ZF -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011112/58672e7f/attachment.html From akostocker at hotmail.com Mon Nov 12 09:24:13 2001 From: akostocker at hotmail.com (Tony Stocker) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld Message-ID: Hi there, I'm also seeing this "issue" but on a slightly larger scale. All of my slave nodes (currently at 6) have 1GB of memory yet in status monitor they all show 670MB used. And the cluster isn't doing anything at all, it's just booted up. What buffers and cache need to eat up this much of the memory? And more importantly isn't this going to affect performance when I actually do try to run something on the cluster? -Tony _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mlrecv at yahoo.com Mon Nov 12 13:43:04 2001 From: mlrecv at yahoo.com (Zhifeng F. Chen) Date: Wed Nov 25 01:01:52 2009 Subject: Compilation problem. Message-ID: <01b501c16bc3$02c6c3e0$906a7080@divine> Hi, When compiling mvich-1.0a6.1 under mpich-1.2.2.3, ./configure --with-device=via --with-arch=LINUX --without-romio -cflags="-DUSE_STDARG -O2 -DCPU_X86 -DNIC_GIGANET -DVIPL095" -lib="-lgnivipl -lpthread" is fine. When I came to make, it reports: cc1: warnings being treated as errors queue.c: In function `MPID_Search_unexpected_for_request': queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP' make[3]: *** [queue.o] Error 1 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 Can anyone help me out? ZF -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011112/2c787f39/attachment.html From hanzl at noel.feld.cvut.cz Tue Nov 13 00:21:32 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:52 2009 Subject: DUPLICATE MESSAGES ON THIS LIST - Compaq makes mess again Message-ID: <20011113092132B.hanzl@unknown-domain> Once again, misconfigured mailer at Compaq site sends list messages back to this list (and most likely to ANY list) and makes them to look like duplicates from the original author. Now it comes via zmamail04.zma.compaq.com. (Last time it came via zcamail4.zca.compaq.com and zcamail5.zca.compaq.com) If you think somebody is mad, it is not you or the message author, it is Compaq postmaster. If they do have any. Regards Vaclav > Subject: RE: beowulf.org list problems > From: Peter Bowen > To: hanzl@noel.feld.cvut.cz > Cc: Nathalie.Viollet@Compaq.com > Date: 01 Nov 2001 08:57:49 -0500 > X-Mailer: Evolution/0.16.99+cvs.2001.10.31.15.22 (Preview Release) > > I have marked all messages from zcamail[0-9]*.zca.compaq.com for hand > moderation. I expect that this mailer problem will be fixed, but if I > see messages caught on or after Nov 8, I will disable all offending > mailing list memberships. > > I do not like taking harsh measures, but clearly this is causing > problems for many people, and is caused by a simple server > misconfiguration. > > Thanks. > Peter > > On Thu, 2001-11-01 at 05:27, hanzl@noel.feld.cvut.cz wrote: > > Peter, > > > > > If I see this re-occur, I will take appropriate measures > > > > It re-occured. Please blacklist them, those Compaq guys are > > unable to mend their mailer. Beowulf list is unusable with this mess > > included. > > > > Thanks > > > > Vaclav > > > > ------------------------------- > > > > > > > Subject: RE: beowulf.org list problems > > > From: Peter Bowen > > > To: hanzl@noel.feld.cvut.cz > > > Cc: Nathalie.Viollet@Compaq.com > > > Date: 29 Oct 2001 10:13:04 -0500 > > > X-Mailer: Evolution/0.15.99 (Preview Release) > > > > > > I am no longer seeing duplicate posts on the list, and, therefore, will > > > not be blacklisting zcamail?.zca.compaq.com from beowulf.org lists. If > > > I see this re-occur, I will take appropriate measures at that time. > > > > > > Thanks. > > > Peter > > > > > > On Mon, 2001-10-29 at 04:29, hanzl@noel.feld.cvut.cz wrote: > > > > Hi Nathalie, > > > > > > > > thanks for swift reaction. > > > > > > > > > But I did not know that my e-mail were forwarded many time > > > > > > > > Your mail was OK, it were messages of OTHER people which got repeated > > > > by your site (and made to look like repeated mail). > > > > > > > > You will also find (as everybody else subscribed to beowulf) those copies > > > > in your personal mailbox (if you kept beowulf messages delivered to > > > > you). Last repeated message I got was from Ron Chen (Subject: Re: > > > > [PBS-USERS] SC2001 technical papers online). > > > > > > > > Second copy of this message went this way: > > > > > > > > Received: from zcamail04.zca.compaq.com (zcamail04.zca.compaq.com [161.114.32.104]) > > > > by blueraja.scyld.com (8.11.6/8.11.6) with ESMTP id f9T27l032336 > > > > for ; Sun, 28 Oct 2001 21:07:47 -0500 > > > > Received: by zcamail04.zca.compaq.com (Postfix, from userid 12345) > > > > id 5AB25F9A; Sun, 28 Oct 2001 18:10:34 -0800 (PST) > > > > Received: from excmun-gh01.dem.cpqcorp.net (excmun-gh01.dem.cpqcorp.net [16.41.88.60]) > > > > by zcamail04.zca.compaq.com (Postfix) with ESMTP > > > > id 7D447E4D; Sun, 28 Oct 2001 18:10:32 -0800 (PST) > > > > Received: by excmun-gh01.dem.cpqcorp.net with Internet Mail Service (5.5.2650.21) > > > > id ; Mon, 29 Oct 2001 03:07:36 +0100 > > > > > > > > Here is a list of messages like this (first is your regular post, > > > > others are probably copies of other people's mails): > > > > > > > > ~/Mail/beowulf>for x in `fgrep -l .zca.compaq.com *`; do fgrep Subject $x; done > > > > > > > > Subject: 2.4 network booted kernel > > > > Subject: Re: good commodity NIC > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: good commodity NIC > > > > Subject: Core files under mpich, p4 device > > > > Subject: Re: good commodity NIC > > > > Subject: good commodity NIC > > > > Subject: good commodity NIC > > > > Subject: Add host in PVM > > > > Subject: Killer SCSI 1 TB fileserver > > > > Subject: Core files under mpich, p4 device > > > > Subject: Re: PBS x SGE comparison? > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: SGE and Scyld > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Re: Core files under mpich, p4 device > > > > Subject: Re: FNN vs GigabitEther & Myrinet > > > > Subject: good commodity NIC > > > > Subject: Re: PBS x SGE comparison? > > > > Subject: Re: using mpich 1.2.2.2 with ifc > > > > Subject: using mpich 1.2.2.2 with ifc > > > > Subject: Killer SCSI 1 TB fileserver > > > > Subject: Failed to mount using Scyld > > > > Subject: Re: good commodity NIC > > > > Subject: good commodity NIC > > > > Subject: RE: using mpich 1.2.2.2 with ifc > > > > Subject: Add host in PVM > > > > Subject: Re: [PBS-USERS] SC2001 technical papers online > > > > > > > > Please forward this additional explanation to your site adminsitrator > > > > (unless (s)he already knows for sure what was going on). > > > > > > > > Regards > > > > > > > > Vaclav > > > > > > > From P.Waltner at science-computing.de Tue Nov 13 04:06:39 2001 From: P.Waltner at science-computing.de (Peter Waltner) Date: Wed Nov 25 01:01:52 2009 Subject: NFSv3-Client bug with large files in kernel-2.2.19-13.beo Message-ID: <200111131206.NAA0000015404@trantor.science-computing.de> Large file support works only partially with NFSv3 with the Beowulf- kernel-2.2.19-13.beo. I can write a file > 2GB over NFS, but I can only read the first 2 GB of the file over NFS. Also ls shows the wrong file size in NFS directories local: waltnepe@ce05sl14 /home/waltnepe > ls -lh /scr/ce05sl14/scr1/waltnepe/ -rwxr-x--- 1 waltnepe admin 2.1G Nov 13 10:51 Bonnie.12536 NFS: waltnepe@ce05sl16 /home/waltnepe > ls -lh /scr/ce05sl14/scr1/waltnepe/ -rwxr-x--- 1 waltnepe admin 2.0G Nov 13 10:51 Bonnie.12536 I checked this with Linux and Irix NFS servers. Peter From jnellis at dslextreme.com Tue Nov 13 15:21:40 2001 From: jnellis at dslextreme.com (Joe Nellis) Date: Wed Nov 25 01:01:52 2009 Subject: Upgrading to 27bz-8 Message-ID: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com> Greetings, We would like to install this new version of Scyld software and we are currently running -7 version. When we originally installed -7 our nodes didn't have floppies or cdroms so we had to crack each case and hook up a floppy to get it booted once. This took considerable time. Once all the nodes were booted, we moved the boot image to each node's individual harddisk. Now I am wondering how we can avoid this again. If we install -8 onto our master node will the nodes come up in enough of a condition with their -7 bootimage to rewrite a new boot image to their harddisks? Otherwise I am assuming I will need a -8 boot image to write to the node disks before I even install -8 on the master. I hope this isn't confusing. thanks, Joe Nellis jnellis@dslextreme.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011113/1ebec38a/attachment.html From jlong at arsc.edu Tue Nov 13 15:37:13 2001 From: jlong at arsc.edu (James Long) Date: Wed Nov 25 01:01:52 2009 Subject: Beowwulf Status Monitor on Scyld In-Reply-To: References: Message-ID: Free memory is wasted memory. It is used as a disk cache until needed by an application, at which time the disk cache is reduced. Jim At 5:24 PM +0000 11/12/01, Tony Stocker wrote: >Hi there, > >I'm also seeing this "issue" but on a slightly larger scale. All of my >slave nodes (currently at 6) have 1GB of memory yet in status monitor they >all show 670MB used. And the cluster isn't doing anything at all, it's just > >booted up. What buffers and cache need to eat up this much of the memory? >And more importantly isn't this going to affect performance when I actually >do try to run something on the cluster? > >-Tony > >_________________________________________________________________ >Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% James Long MPP Specialist Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks, AK 99775-6020 jlong@arsc.edu (907) 474-5731 work (907) 474-5494 fax %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% From hanzl at noel.feld.cvut.cz Wed Nov 14 03:09:55 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:52 2009 Subject: Upgrading to 27bz-8 In-Reply-To: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com> References: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com> Message-ID: <20011114120955L.hanzl@unknown-domain> I guess once you ever managed to beoboot your nodes from harddisks (you have working beoboot partition type 89), you should never have to touch your nodes again, they should boot from newly installed master as well. (Unless you change network card on slave node, e.g. installing 3c905C might cause you problems.) Regards Vaclav > From: "Joe Nellis" > > Greetings, > > We would like to install this new version of Scyld software and we are > currently running -7 version. When we originally installed -7 our > nodes didn't have floppies or cdroms so we had to crack each case and > hook up a floppy to get it booted once. This took considerable time. > Once all the nodes were booted, we moved the boot image to each node's > individual harddisk. Now I am wondering how we can avoid this again. > If we install -8 onto our master node will the nodes come up in enough > of a condition with their -7 bootimage to rewrite a new boot image to > their harddisks? Otherwise I am assuming I will need a -8 boot image > to write to the node disks before I even install -8 on the master. I > hope this isn't confusing. > > thanks, > Joe Nellis > jnellis@dslextreme.com From John.ws.Strange at marconi.com Wed Nov 14 05:58:26 2001 From: John.ws.Strange at marconi.com (Strange, John) Date: Wed Nov 25 01:01:52 2009 Subject: Compile farm? Message-ID: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com> Well you can use mexec with mosix to get things to work, and it does work quite well but it doesn't scale because of some underlying filesystem problems we are having. I've got 25 machines, our backend storage currently is netapp filers, so using NFS I have to turn off client side caching. It basically crushes the filer doing constant file handling lookups. I'm still playing with a netapp that we have on spare, maybe I'll have some luck with finding away around the problems that we are having. There is no really good backend filesystem that you can use, maybe GFS but it's still relatively new and too bleeding edge for pratical use. (IMHO) Plus we don't have the hardware for it fiber channel and we have *NO* budget. If anyone has any suggestions I would glad to hear them. Thanks, John Strange Marconi john.ws.strange.at.marconi.com -----Original Message----- From: Scott Thomason [mailto:SThomaso@phmining.com] Sent: Friday, November 02, 2001 2:25 PM To: Beowulf@beowulf.org Subject: Compile farm? Greetings. I'm interested in setting up a shell account/batch process/compile farm system for our developers, and I'm wondering if Beowulf clusters are well suited to that task. We're not interested in writing parallel code using PVM or MPI, we just want to log into what appears to be one big server and have it dispatch the workload amongst the slave processors. Is Beowulf good at that? ---scott p.s. Sorry if there are duplicates of this message; I used the wrong email address earlier. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hanzl at noel.feld.cvut.cz Wed Nov 14 08:49:09 2001 From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz) Date: Wed Nov 25 01:01:52 2009 Subject: NFS export from clients? Message-ID: <20011114174909J.hanzl@unknown-domain> Did anybody manage to NFS export disks from Scyld nodes? I suppose I should start NFS daemons from node_up script and re-export it out of the cluster using unfsd. (I would also like to mount node harddisks using autofs to maybe avoid fsck after node crash.) I do not want PVFS, I want individual independent filesystems on nodes (there are several large data sets and often just one of them is used, other nodes may even be off). I will try, but if anybody already has some experience with this, I would be happy to hear from him. Thanks Vaclav From raij at cs.unc.edu Wed Nov 14 15:10:16 2001 From: raij at cs.unc.edu (Andrew B. Raij) Date: Wed Nov 25 01:01:52 2009 Subject: Public slaves Message-ID: Hi everybody, I'd like to set up a scyld cluster with slaves open to the public network. I'd also like each slave to get the same ip of my choosing every time it is booted and slave ips shouldn't have to be confined to any specific range. I understand that doing this is contradictory to the beowulf design but is it possible? thanks, -Andrew From ron_chen_123 at yahoo.com Wed Nov 14 21:43:54 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:52 2009 Subject: Fwd: [Globus-discuss] Globus Toolkit in the news Message-ID: <20011115054354.76538.qmail@web14704.mail.yahoo.com> Entropia's distributed computing used in grid computing. -Ron --- Ian Foster wrote: > Date: Wed, 14 Nov 2001 15:23:58 -0600 > To: discuss@globus.org, management@globus.org > From: Ian Foster > Subject: [Globus-discuss] Globus Toolkit in the news > > http://news.cnet.com/news/0-1003-200-7849355.html > > > > _______________________________________________________________ > Ian > Foster > http://www.mcs.anl.gov/~foster > > Math & Computer Science Div. Dept of > Computer Science > Argonne National Laboratory The > University of Chicago > Argonne, IL 60439, U.S.A. Chicago, IL > 60637, U.S.A. > 630 252 4619 (fax 5986) 773 702 > 3487 (fax 8487) > __________________________________________________ Do You Yahoo!? Find the one for you at Yahoo! Personals http://personals.yahoo.com From bdorland at kendall.umd.edu Thu Nov 15 07:48:08 2001 From: bdorland at kendall.umd.edu (Bill Dorland) Date: Wed Nov 25 01:01:52 2009 Subject: Scyld 27bz-8 problem (symptom: netstat) Message-ID: <200111151548.fAFFm8K20412@kendall.umd.edu> I recently purchased the $2.95 copy (version 27bz-8) of Scyld and have experienced some difficulties with the installation. Before putting together a long post, I'd like to know if anyone else has successfully performed a diskless installation of Scyle Beowulf "Label Side Up" Edition Copyright 2001 Scyld Computing Corp. P/N: 27BZ-8 If so, I am curious whether anyone else has experienced an incorrect response from the command 'netstat -avupt' when executed as root. I find that the system does not believe root is root. I have not connected my cluster to the internet, and I installed to a new, blank hard drive. No other software has been introduced. In a completely unrelated incident, another system that I work on was compromised some time back by a rootkit which exploited a vulnerability in SSH, and interestingly, one of the early symptoms of trouble on that system was this same thing: root execution of 'netstat -avupt' complained that root was not root. The version of SSH that is shipped with 27bz-8 is in fact vulnerable to the attack that I experienced on this unrelated system. I am therefore concerned that something that is admittedly quite unlikely might have happened, i.e., that the 27bz-8 distribution was shipped despite having been compromised in some way. I would be very happy to hear from anyone that can assure me that this is not the case by providing some explanation for the odd netstat behavior. In the meantime, I have spent several days tracking down this problem and will continue to do so. Since the openssh rpm's shipped with Scyld are modified to be compatible with LFS (not to mention the kernel and so on), I cannot trivially recover from this problem, if it is indeed a problem. Also, I cannot find any patches or updates to the 27bz-8 release on line. This is my first post to a public list-server. I apologize in advance for any breach of netiquette. --Bill From wrankin at ee.duke.edu Thu Nov 15 09:35:16 2001 From: wrankin at ee.duke.edu (William T. Rankin) Date: Wed Nov 25 01:01:52 2009 Subject: Public slaves In-Reply-To: <200111151701.fAFH15029660@blueraja.scyld.com> Message-ID: > From: "Andrew B. Raij" > > Hi everybody, > > I'd like to set up a scyld cluster with slaves open to the public > network. I'd also like each slave to get the same ip of my choosing every > time it is booted and slave ips shouldn't have to be confined to any > specific range. I understand that doing this is contradictory to the > beowulf design but is it possible? What you are talking about is to set up all the nodes as general purpose workstations and using them as a cluster. This isn't "contrary" to the beowulf design (that's how my first cluster was set up). It is contrary IIRC to the basic Scyld assumptions. Have you considered just using kickstart with a standard linux distribution to configure your machines? Or is there something specific to Scyld that you are interested in? -bill From Nirmal.Bissonauth at durham.ac.uk Thu Nov 15 09:49:24 2001 From: Nirmal.Bissonauth at durham.ac.uk (Nirmal Bissonauth) Date: Wed Nov 25 01:01:52 2009 Subject: Tyan Tunder K7 and Gigabit Ethernet cards Message-ID: Hi all, I would like to know if people have been successful in using gigabit ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462) This has two built-in 3com 100 Base T cards. I have tried to use a DLINK DGE-550T card with the latter but without much success. Even after disabling the onboard NICs, the card did not work properly. The problem is that an interrupt is not set after a DMA transmitt (something to do with the APIC I presume). I tried linux kernel 2.4.12-ac3 with the latest driver from Dlinks website, but that did not make much difference either. I have six of these. The cards that I am particularly interested to hear about are the 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T) Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T) NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T) Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T) SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX) Or any other cheap gigabit network cards. Regards Nirmal ----------------------------------------------------------------------- Nirmal Bissonauth email: nirmal.bissonauth@durham.ac.uk University of Durham www: http://aig-www.dur.ac.uk ----------------------------------------------------------------------- From joelja at darkwing.uoregon.edu Thu Nov 15 10:35:27 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:52 2009 Subject: Tyan Tunder K7 and Gigabit Ethernet cards In-Reply-To: Message-ID: I'd recomend trying the latest natsemi dp8381x driver (1.11) - from Donald Becker. it's on the scyld website, or in the recent kernel sources. joelja On Thu, 15 Nov 2001, Nirmal Bissonauth wrote: > Hi all, > > I would like to know if people have been successful in using gigabit > ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462) > This has two built-in 3com 100 Base T cards. > > I have tried to use a DLINK DGE-550T card with the latter but without much > success. Even after disabling the onboard NICs, the card did not work > properly. The problem is that an interrupt is not set after a DMA > transmitt (something to do with the APIC I presume). I tried linux kernel > 2.4.12-ac3 with the latest driver from Dlinks website, but that did not > make much difference either. I have six of these. > > The cards that I am particularly interested to hear about are the > 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T) > Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T) > NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T) > Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T) > SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX) > > Or any other cheap gigabit network cards. > > Regards > Nirmal > > ----------------------------------------------------------------------- > Nirmal Bissonauth email: nirmal.bissonauth@durham.ac.uk > University of Durham www: http://aig-www.dur.ac.uk > ----------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From math at velocet.ca Thu Nov 15 10:44:34 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:52 2009 Subject: Tyan Thunder K7 and Gigabit Ethernet cards In-Reply-To: ; from Nirmal.Bissonauth@durham.ac.uk on Thu, Nov 15, 2001 at 05:49:24PM +0000 References: Message-ID: <20011115134434.Y66460@velocet.ca> On Thu, Nov 15, 2001 at 05:49:24PM +0000, Nirmal Bissonauth's all... > Hi all, > > I would like to know if people have been successful in using gigabit > ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462) > This has two built-in 3com 100 Base T cards. > > I have tried to use a DLINK DGE-550T card with the latter but without much > success. Even after disabling the onboard NICs, the card did not work > properly. The problem is that an interrupt is not set after a DMA > transmitt (something to do with the APIC I presume). I tried linux kernel > 2.4.12-ac3 with the latest driver from Dlinks website, but that did not > make much difference either. I have six of these. I have a single DGE-500T sitting on a Tyan Tiger talking to an SMC card also based on the NS82830 chipset. No problems sending data EXCEPT in freebsd. Soon after alot of data flows bi directionally, the card drops the carrier. (in fact, any NS82830 card has the same problem.) Or at least the OS tells me that. I think there's a problem with the FBSD nge driver, but that might be fixed (I think Im on 4.3 here). With a linux 2.4.13 kernel (cant remember if I dumped appropriate -ac patches in or not - I usually do) with the appropriate gbe drivers compiled in I had no problems with dropped carriers except on one card that was finicky with everything. But otherwise I've had no problems. You on linux or fbsd? (How many other people use FBSD for clustering?) > The cards that I am particularly interested to hear about are the > 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T) > Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T) > NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T) > Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T) > SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX) Got a mix and match here of GBE cards here - SMC 9452 (not -62), couple of ARKs and a Linksys, all based on the 82830 card. Only one did not work - it would drop carrier as soon as any amount of data went thru it, but then again Im not using Cat6e here, just Cat5+ cables which worked between all other pairs of cards of my 6 here. I find relatively low system/interupt time spent on the 82830 cards, like 2% cpu for sending about 150-200Mbps down the wire and receiving the same at the same time with avg size 1K packets (gromacs 3.0 d.dppc benchmark.) Here's an interesting thing that I came across on the link on /. today regarding FBSD vs LINUX: http://www.byte.com/documents/s=1794/byt20011107s0001/1112_moshe.html On the Linux side, I attached all interrupts coming from the network adaptor to one CPU. With the new TCP/IP stack in the 2.4 kernels this really becomes necessary. Otherwise, you might find the incoming packets arranged out of order, because later interrupts are serviced (on another CPU) before earlier ones, thus requiring a reordering further down the handling layers. Can freebsd do that; sounds like a way to ensure further efficiency. /kc > > Or any other cheap gigabit network cards. > > Regards > Nirmal > > ----------------------------------------------------------------------- > Nirmal Bissonauth email: nirmal.bissonauth@durham.ac.uk > University of Durham www: http://aig-www.dur.ac.uk > ----------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From mas at ucla.edu Thu Nov 15 10:59:06 2001 From: mas at ucla.edu (Michael Stein) Date: Wed Nov 25 01:01:52 2009 Subject: Scyld 27bz-8 problem (symptom: netstat) In-Reply-To: <200111151548.fAFFm8K20412@kendall.umd.edu>; from Bill Dorland on Thu, Nov 15, 2001 at 10:48:08AM -0500 References: <200111151548.fAFFm8K20412@kendall.umd.edu> Message-ID: <20011115105906.A22242@mas1.oac.ucla.edu> > If so, I am curious whether anyone else has experienced an incorrect > response from the command 'netstat -avupt' when executed as root. I > find that the system does not believe root is root. I see this on several systems. One behind a firewall and I'd guess never attacked (the firewall doesn't allow inbound anything even ssh). This is a RH 7.0 system with kernel 2.2.16-22. netstat is from net-tools-1.56 (RH 7.0). I suspect it's just a partially built internal file control block of some sort in the kernel. Find the process id for "[mdrecoveryd]", cd to /proc/ and then try to ls the fd directory. I traced it this far by running a recompiled (COPTS=-g) netstat under gdb as root with a breakpoint in netstat.c function prg_cache_load where the variable eacces gets set to 1. Futher tracing would probably have to be in the kernel. From raij at cs.unc.edu Thu Nov 15 11:01:05 2001 From: raij at cs.unc.edu (Andrew B. Raij) Date: Wed Nov 25 01:01:52 2009 Subject: Public slaves In-Reply-To: Message-ID: I've heard much about scyld's cluster management tools, so I thought it made senes to stick with scyld and modify things to fit my situation. If I were to use kickstart and a standard linux distro, what would I be losing from scyld? -Andrew On Thu, 15 Nov 2001, William T. Rankin wrote: > > From: "Andrew B. Raij" > > > > Hi everybody, > > > > I'd like to set up a scyld cluster with slaves open to the public > > network. I'd also like each slave to get the same ip of my choosing every > > time it is booted and slave ips shouldn't have to be confined to any > > specific range. I understand that doing this is contradictory to the > > beowulf design but is it possible? > > What you are talking about is to set up all the nodes as general > purpose workstations and using them as a cluster. This isn't > "contrary" to the beowulf design (that's how my first cluster was > set up). It is contrary IIRC to the basic Scyld assumptions. > > Have you considered just using kickstart with a standard linux > distribution to configure your machines? Or is there something > specific to Scyld that you are interested in? > > -bill > > From jgl at unix.shell.com Thu Nov 15 11:22:10 2001 From: jgl at unix.shell.com (J. G. LaBounty) Date: Wed Nov 25 01:01:52 2009 Subject: Tyan Tunder K7 and Gigabit Ethernet cards In-Reply-To: Your message of Thu, 15 Nov 2001 17:49:24 +0000. Message-ID: <200111151922.NAA08339@volta.shell.com> We are using the Intel PRO/1000T adapter on a Tyan S2460 but you have to build the driver which Intel supplies on their web site. This makes it a pain to install but once you have they perform well. The kernel we are using is 2.4.13-ac5. > From: Nirmal Bissonauth > Reply-To: Nirmal Bissonauth > To: beowulf@beowulf.org > Subject: Tyan Tunder K7 and Gigabit Ethernet cards > Date: Thu, 15 Nov 2001 17:49:24 +0000 (GMT) > > Hi all, > > I would like to know if people have been successful in using gigabit > ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462) > This has two built-in 3com 100 Base T cards. > > I have tried to use a DLINK DGE-550T card with the latter but without much > success. Even after disabling the onboard NICs, the card did not work > properly. The problem is that an interrupt is not set after a DMA > transmitt (something to do with the APIC I presume). I tried linux kernel > 2.4.12-ac3 with the latest driver from Dlinks website, but that did not > make much difference either. I have six of these. > > The cards that I am particularly interested to hear about are the > 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T) > Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T) > NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T) > Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T) > SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX) > > Or any other cheap gigabit network cards. > > Regards > Nirmal > > ----------------------------------------------------------------------- > Nirmal Bissonauth email: nirmal.bissonauth@durham.ac.uk > University of Durham www: http://aig-www.dur.ac.uk > ----------------------------------------------------------------------- > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Peter.Lindgren at experian.com Thu Nov 15 11:54:34 2001 From: Peter.Lindgren at experian.com (Peter Lindgren) Date: Wed Nov 25 01:01:52 2009 Subject: Comparison of clustering systems? Message-ID: Top500 has a poll asking "what cluster system do you use?". They offer the choices: Oscar Scyld Score SCE NPACI Rocks MSC.Linux Other Follow this link to see current results: http://clusters.top500.org/pollbooth.php?qid=clustersys&aid=-1 So far I've actually installed and gotten my application running with Scyld. I have most of the other systems on CD. I was about to try Rocks. But to try them all is going to take me a long time.... I wonder whether anyone has recently done a comparison of these (or other) systems? (I also know of Cplant and IBM's CSK.) I found an NHSE article from 1996, but that's ancient history. Peter Lindgren Phone: 847 944 4515 Fax: 847 517 5889 E-mail: peter.lindgren@experian.com From SGaudet at turbotekcomputer.com Thu Nov 15 12:38:35 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:52 2009 Subject: Tyan Tunder K7 and Gigabit Ethernet cards Message-ID: <3450CC8673CFD411A24700105A618BD6170F5B@911TURBO> Hello, > I would like to know if people have been successful in using gigabit > ethernet(over copper) cards with the Tyan Thunder K7 > motherboard. (s2462) > This has two built-in 3com 100 Base T cards. > > I have tried to use a DLINK DGE-550T card with the latter but > without much > success. Even after disabling the onboard NICs, the card did not work > properly. The problem is that an interrupt is not set after a DMA > transmitt (something to do with the APIC I presume). I tried > linux kernel > 2.4.12-ac3 with the latest driver from Dlinks website, but > that did not > make much difference either. I have six of these. > > The cards that I am particularly interested to hear about are the > 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T) > Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T) > NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T) > Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T) > SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX) > > Or any other cheap gigabit network cards. > -------------------------------------------------------------- Ever look at Syskonnect? http://www.syskonnect.com/syskonnect/products/sk-98xx.htm Had very good luck with them, and they are based in Europe. Cheers, Steve Gaudet Linux Sales Engineer ..... <(???)> ============================================================================ | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | toll free:800-573-5393 web: http://www.turbotekcomputer.com | ============================================================================ From opengeometry at yahoo.ca Thu Nov 15 13:11:10 2001 From: opengeometry at yahoo.ca (William Park) Date: Wed Nov 25 01:01:52 2009 Subject: Comparison of clustering systems? In-Reply-To: ; from Peter.Lindgren@experian.com on Thu, Nov 15, 2001 at 01:54:34PM -0600 References: Message-ID: <20011115161110.A1377@node0.opengeometry.ca> On Thu, Nov 15, 2001 at 01:54:34PM -0600, Peter Lindgren wrote: > Top500 has a poll asking "what cluster system do you use?". They offer > the choices: > > Oscar > Scyld > Score > SCE > NPACI Rocks > MSC.Linux > Other What.. no Mosix? -- William Park, Open Geometry Consulting, . 8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin From france at handhelds.org Thu Nov 15 15:57:43 2001 From: france at handhelds.org (George France) Date: Wed Nov 25 01:01:52 2009 Subject: Compile farm? In-Reply-To: References: Message-ID: <01111518574301.18513@shadowfax.middleearth> Greetings, Install pvm, there are patches for 'gnu make' to use pvm, then just do a "make -j , simple, easy and it works for me on i686, alpha and the ARM arch. Best Regards, --George On Friday 02 November 2001 14:25, Scott Thomason wrote: > Greetings. I'm interested in setting up a shell account/batch > process/compile farm system for our developers, and I'm wondering if > Beowulf clusters are well suited to that task. We're not interested in > writing parallel code using PVM or MPI, we just want to log into what > appears to be one big server and have it dispatch the workload amongst the > slave processors. Is Beowulf good at that? ---scott > > p.s. Sorry if there are duplicates of this message; I used the wrong email > address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From erayo at cs.bilkent.edu.tr Fri Nov 16 04:22:00 2001 From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa)) Date: Wed Nov 25 01:01:52 2009 Subject: Comparison of clustering systems? In-Reply-To: <20011115161110.A1377@node0.opengeometry.ca> References: <20011115161110.A1377@node0.opengeometry.ca> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday 15 November 2001 23:11, William Park wrote: > > > > Oscar > > Scyld > > Score > > SCE > > NPACI Rocks > > MSC.Linux > > Other > > What.. no Mosix? Perhaps they're talking about HPC solutions only. - -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE79QTofAeuFodNU5wRAoEIAJ9vHp/aSKACsRnJg4GFL8a/N/P+GgCfQZU5 WLAp1PsFnnwMPndg4lX5UwY= =/AGJ -----END PGP SIGNATURE----- From rgb at phy.duke.edu Fri Nov 16 05:45:38 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:52 2009 Subject: Public slaves In-Reply-To: Message-ID: On Thu, 15 Nov 2001, Andrew B. Raij wrote: > I've heard much about scyld's cluster management tools, so I thought it > made senes to stick with scyld and modify things to fit my situation. If > I were to use kickstart and a standard linux distro, what would I be > losing from scyld? A better way to put it would be what do you need from scyld on your cluster? As you say, scyld has cluster management tools and so forth, but clusters existed for years before scyld and it isn't too hard to set up a cluster without it. Indeed, if your cluster is intended to be a compute farm where you would like folks to be able to log into nodes one at a time by name to do work (which seems quite possible if you want to give explicit nodes permanent names) then scyld is quite possibly not your best bet as it follows the "true beowulf" paradigm of the cluster being a single-headed virtual parallel supercomputer, where you would no more login to a node than you would login to a specific processor in a big SMP box. I will echo Bill's suggestion as it is how we set up our clusters here as well (they are primarily compute farms used to run many instances of embarrassingly parallel code for e.g. Monte Carlo or nuclear theory computations (generating slices of ongoing collision processes, for example). The engineering of the cluster is pretty simple: a) Server(s) provide(s) shared NFS mounts to nodes for users, DHCP for nodes, NFS or FTP or HTTP export of e.g. RH distro and kickstart files. b) Build kickstart file for "typical node". I can give you one if you need it that we use here. We make the nodes relatively "fat", since they have small local hard disks and "small" local hard disks are currently so absurdly large that you could drop three or four completely different OS installations on them and still have room for swap and twenty GB of user scratch space. In fact, you could easily install RH AND scyld on the nodes and select which way you wanted to boot the cluster at boot time. It's just a matter of how you choose to partition -- save 4 GB partition per boot. The kickstart file specifies how the node disk is to be laid out, packages to be installed, what (if any) video support, and more, culminating in a post-install script that can be run to "polish" the setup -- installing the appropriate /etc/passwd, /etc/shadow, /etc/group, building /etc/fstab, and so forth. c) Set up the dhcpd.conf on the dhcp server. Here is a typical node entry for my "ganesh" cluster: host g01 { hardware ethernet 00:01:03:BD:C5:7a; fixed-address 152.3.182.155; next-server install.phy.duke.edu; filename "/export/install/linux/rh-7.1/ks/beowulf"; option domain-name "phy.duke.edu"; option dhcp-class-identifier "PXEClient"; } Note that this maps one MAC address to one IP number (in many cases one would assign node addresses out of a private internal network space like 192.168.x.x -- these nodes for the time being are publically accessible and secured like any other workstation). One defines the name of the server to be used by name or IP number. Elsewhere there are global definitions for things like NIS servers, nameservers, and the like, so the booted host knows how to resolve the name. filename gives the path to the kickstart file that will then direct the install. If one wishes to provide it from a web or ftp server, prepend the appropriate http://. The other options are local (and hopefully obvious in purpose). This particular node has PXE booting set up and can be installed by just turning it on. Without this, one probably needs a boot floppy from the matching distribution and a floppy drive per node. Once these things are set up, one merely boots the system. If you use a boot floppy, just enter "ks" at the boot prompt when requested OR cut a custom boot floppy where ks is the default (I generally do this for nodes without ks as it means that you don't need a monitor or keyboard to reinstall). Otherwise it is pretty much just turn it on. On a good day, it will boot, find the dhcp server, get an IP number and identity, and start building, loading, mounting "install" ramdisks as fast as the network and server load permit. (If PXE booting, it does all this but in a somewhat different order as it has to get the boot kernel over the network first). It then rips through the kickstart file's instructions (partition and format the disk, install swap, and start installing packages). When finished, it runs the post script, which can end with instructions to reboot the newly installed node ready for production. On a good day, we can reinstall nodes in about 3-4 minutes. In fact, when I give folks a tour of our cluster, I generally include a reinstall of a node just to show them how trivial it is. We keep (or rather can build dynamically on demand) a special "install" lilo.conf file on the systems so that we can even reinstall them remotely from a script -- copy in the install lilo.conf, run lilo, reboot (system installs and reboots into operational mode). An impressive display of the scalability of modern linux distributions, since exactly the same trick will work for every workstation in an organization. To manage a network, one only needs to "work" on servers (as it should be). nodes, workstation clients, desktops, all of them should be complete kickstart boilerplate with minimal customization all encapsulated in a (possibly host specific) kickstart file. If one crashes or becomes corrupt or is cracked, a three or four minute reboot and it has a clean fresh install. Regarding parallel computing support, of course your kickstart file can contain e.g. MPI(s) of your prefered flavor(s), PVM, and so forth. It can also include at least the standard remote workstation management tools, e.g. syslogng, and perhaps a few that are more cluster management/monitoring tools although there is indeed yet a bit of a dearth of these the mainstream linuces. You have to decide whether you are willing to live with these tools in order to have nodes that look like remote access workstations or would prefer Scyld's paradigm of nodes that look like multiple processors on a single system (with matching "single system" management tools). Or both. Set it up to boot both ways on demand, and see which one works better for you. Neither one is particularly difficult to build and configure, and the time you save making the truly correct decision for your enterprise will likely pay for the time you spend figuring out the truly correct decision to make. rgb > > -Andrew > > On Thu, 15 Nov 2001, William T. Rankin wrote: > > > > From: "Andrew B. Raij" > > > > > > Hi everybody, > > > > > > I'd like to set up a scyld cluster with slaves open to the public > > > network. I'd also like each slave to get the same ip of my choosing every > > > time it is booted and slave ips shouldn't have to be confined to any > > > specific range. I understand that doing this is contradictory to the > > > beowulf design but is it possible? > > > > What you are talking about is to set up all the nodes as general > > purpose workstations and using them as a cluster. This isn't > > "contrary" to the beowulf design (that's how my first cluster was > > set up). It is contrary IIRC to the basic Scyld assumptions. > > > > Have you considered just using kickstart with a standard linux > > distribution to configure your machines? Or is there something > > specific to Scyld that you are interested in? > > > > -bill > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From mcosta at fc.up.pt Fri Nov 16 09:16:59 2001 From: mcosta at fc.up.pt (Miguel Costa) Date: Wed Nov 25 01:01:52 2009 Subject: Rexec to Scyld nodes Message-ID: <3BF54A0B.5020608@fc.up.pt> Hello everybody, I just got Scyld from Linux Central and tried it on a cluster of five dual athlon machines. Everything works fine when I run MPI applications and I really like this "one big smp machine" design, as opposed to a cluster of workstations, but I also need other people to be able to run independent (non MPI) programs on the nodes. How can these processes be started on the nodes (without the users including bprocs routines in their code)? Can they use something like rexec or do I have to switch between Scyld Beowulf and Redhat Workstations depending on what I want to do? I already have Redhat installed in all the machines but it would be better if I didn't have to reboot between "parallel multicomputer" and "computing farm". hope this is clear, regards, Miguel Costa University of Porto Portugal From eswardev at yahoo.com Fri Nov 16 10:15:01 2001 From: eswardev at yahoo.com (Eswar Dev) Date: Wed Nov 25 01:01:52 2009 Subject: Pgi Atlas on Linux_Athlon Message-ID: <20011116181501.82208.qmail@web14310.mail.yahoo.com> Hi! I am getting bad results with PGI-ATLAS on LInux Based ATHLON. I need to get more speed up then what it shows hear. Does any one had similar problems. Help needed Thanks!!!! -Eswarkumar visit:http://atlantis.engr.odu.edu:8080 ______________________________________________________ This is for ./xsl3blastst ------------- GEMM ---------------------------------- TST# A B M N K ALPHA LDA LDB BETA LDC TIME MFLOP SpUp TEST ==== = = ==== ==== ==== ===== ==== ==== ===== ==== ===== ==== ===== 0 N N 100 100 100 1.0 1000 1000 1.0 1000 0.00 0.0 1.00 ----- 0 N N 100 100 100 1.0 1000 1000 1.0 1000 0.00 0.0 0.00 PASS 1 N N 200 200 200 1.0 1000 1000 1.0 1000 0.06 266.7 1.00 ----- 1 N N 200 200 200 1.0 1000 1000 1.0 1000 0.06 266.7 1.00 PASS 2 N N 300 300 300 1.0 1000 1000 1.0 1000 0.28 192.9 1.00 ----- 2 N N 300 300 300 1.0 1000 1000 1.0 1000 0.21 257.1 1.33 PASS 3 N N 400 400 400 1.0 1000 1000 1.0 1000 0.91 140.7 1.00 ----- 3 N N 400 400 400 1.0 1000 1000 1.0 1000 0.51 251.0 1.78 PASS 4 N N 500 500 500 1.0 1000 1000 1.0 1000 2.07 120.8 1.00 ----- 4 N N 500 500 500 1.0 1000 1000 1.0 1000 0.99 252.5 2.09 PASS 5 N N 600 600 600 1.0 1000 1000 1.0 1000 3.85 112.2 1.00 ----- 5 N N 600 600 600 1.0 1000 1000 1.0 1000 1.69 255.6 2.28 PASS 6 N N 700 700 700 1.0 1000 1000 1.0 1000 6.26 109.6 1.00 ----- 6 N N 700 700 700 1.0 1000 1000 1.0 1000 2.70 254.1 2.32 PASS 7 N N 800 800 800 1.0 1000 1000 1.0 1000 9.44 108.5 1.00 ----- 7 N N 800 800 800 1.0 1000 1000 1.0 1000 4.03 254.1 2.34 PASS 8 N N 900 900 900 1.0 1000 1000 1.0 1000 13.47 108.2 1.00 ----- 8 N N 900 900 900 1.0 1000 1000 1.0 1000 5.72 254.9 2.35 PASS 9 N N 1000 1000 1000 1.0 1000 1000 1.0 1000 18.39 108.8 1.00 ----- 9 N N 1000 1000 1000 1.0 1000 1000 1.0 1000 7.87 254.1 2.34 PASS 10 tests run, 10 passed __________________________________________________ Do You Yahoo!? Find the one for you at Yahoo! Personals http://personals.yahoo.com From agrajag at scyld.com Fri Nov 16 19:23:57 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:52 2009 Subject: Rexec to Scyld nodes In-Reply-To: <3BF54A0B.5020608@fc.up.pt>; from mcosta@fc.up.pt on Fri, Nov 16, 2001 at 05:16:59PM +0000 References: <3BF54A0B.5020608@fc.up.pt> Message-ID: <20011116222357.A12504@blueraja.scyld.com> On Fri, 16 Nov 2001, Miguel Costa wrote: > How can these processes be started on the nodes (without the users > including bprocs routines in their code)? > > Can they use something like rexec or do I have to switch between Scyld > Beowulf and Redhat Workstations > depending on what I want to do? Are you talking about rexec the program or the function call? If you are talking about the program, you should be able to use bpsh instead. If its the function call, you should be able to use bproc_execmove() instead. This is a BProc function call, but it isn't any more complicated than the rexec() function call. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011116/8e78ff1f/attachment.bin From rajkumar at csse.monash.edu.au Sat Nov 17 01:34:42 2001 From: rajkumar at csse.monash.edu.au (Rajkumar Buyya) Date: Wed Nov 25 01:01:52 2009 Subject: Cluster books in Chinese Message-ID: <3BF62F32.A2D9BA65@csse.monash.edu.au> Dear All, I am pleased to inform you that our two volumes book on: High Performance Cluster Computing http://www.csse.monash.edu.au/~rajkumar/cluster/ published by Prentice Hall, USA (English version) are now available in Chinese language: http://www.csse.monash.edu.au/~rajkumar/cluster/chinese/ published by Publishing House of Electronics Industry (PHEL), Beijing, China. Hopefully, this helps in enhancing the adoption and usage of Cluster Technologies in Chinese regions--I was told that the book is available in China at affordable cost ($20 for both volumes). -- Best regards, Raj PS: Both versions have a chapter on "Beowulf"! ------------------------------------------------------------------------ Rajkumar Buyya School of Computer Science and Software Engineering Monash University, C5.41, Caulfield Campus Melbourne, VIC 3145, Australia Phone: +61-3-9903 1969 (office); +61-3-9571 3629 (home) Fax: +61-3-9903 2863; eFax: +1-801-720-9272 Email: rajkumar@buyya.com | rajkumar@csse.monash.edu.au URL: http://www.buyya.com | http://www.csse.monash.edu.au/~rajkumar ------------------------------------------------------------------------ From zadok at phreaker.net Thu Nov 1 14:28:47 2001 From: zadok at phreaker.net (Hereward Cooper) Date: Wed Nov 25 01:01:52 2009 Subject: [ot] Re: AMD testing In-Reply-To: References: Message-ID: <20011101222847.1136f542.zadok@phreaker.net> [i'm off list so please reply to me directly aswell as the list] Has any user of the Tiger MP S2460 had experience of what happens if you DON'T use registered memory? Will it blow up :-) ?? Thanks, Hereward From SThomaso at phmining.com Thu Nov 1 16:25:08 2001 From: SThomaso at phmining.com (Scott Thomason) Date: Wed Nov 25 01:01:52 2009 Subject: Compile farm? Message-ID: Greetings. I'm interested in setting up a shell account/batch process/compile farm system for our developers, and I'm wondering if Beowulf clusters are well suited to that task. We're not interested in writing parallel code using PVM or MPI, we just want to log into what appears to be one big server and have it dispatch the workload amongst the slave processors. Is Beowulf good at that? ---scott From ron_chen_123 at yahoo.com Fri Nov 2 11:44:54 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:52 2009 Subject: Compile farm? Message-ID: <20011102194454.28360.qmail@web14707.mail.yahoo.com> What you need is a batch system. There are 2 free batch systems, SGE and PBS. Both of them are opensource, but nevertheless, you can get 7x24 support if you are willing to pay. PBS: www.openpbs.com www.pbspro.com SGE: www.sun.com/gridware gridengine.sunsource.net Also, SGE has qmake, which can execute several instances of make on mutliple machines for one single make job. Install note: http://supportforum.sun.com/gridengine/appnote_install.html -Ron --- Scott Thomason wrote: > Greetings. I'm interested in setting up a shell > account/batch process/compile farm system for our > developers, and I'm wondering if Beowulf clusters > are well suited to that task. We're not interested > in writing parallel code using PVM or MPI, we just > want to log into what appears to be one big server > and have it dispatch the workload amongst the slave > processors. Is Beowulf good at that? > ---scott > > p.s. Sorry if there are duplicates of this message; > I used the wrong email address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jhalpern at howard.edu Fri Nov 2 18:30:33 2001 From: jhalpern at howard.edu (Joshua Halpern) Date: Wed Nov 25 01:01:52 2009 Subject: Gigabit Ethernet switches and network adaptors. References: Message-ID: <3BE356C9.D5353B6B@howard.edu> I am in the process of specifying components for a small (8 node, Althon based) cluster. In searching the net I came across a reasonably priced 8 port copper Gigabit switch and network adapters. I have a friend who says that the things that he makes are inexpensive, not cheap. Does anyone know whether the following are inexpensive, or just cheap? Any experience with them? TrendNet TEG S80TX 8 port unmanaged switch - $799 http://www.trendware.com/products/TEG-S80TX.htm Price: http://www.csocomputers.com/Hardware/Networking/Trendnet/Gigabit.htm Review: http://www.8wire.com/articles/?aid=2300 TrendNet PCITX 32 bit PCI network adapter - $69 http://www.trendware.com/products/TEG-PCITX.htm or the Accton EN1408T 32 Bit PCI network adapter - $99 Review: http://www.8wire.com/articles/index.asp?AID=2212 http://www.8wire.com/articles/index.asp?AID=2280 From bnihan at angstrom.com Thu Nov 8 06:20:36 2001 From: bnihan at angstrom.com (Bill Nihan) Date: Wed Nov 25 01:01:52 2009 Subject: ExtremeNetworks Summit and channel bonding with Scyld References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch> Message-ID: <009901c16860$8b3770a0$a290443f@angstrom.com> The Extreme Networks Summit 48 works quite well in a channel bonding configuration. I put together a 16 node cluster channel bonded at Atipa and don't recall any problems at all. Unfortuantly I did not save any netperf results but I remember they were in line with discussions on the beowulf list. We were no using Scyld at that time. --Bill Nihan Angstrom Microsystems bnihan@angstrom.com ----- Original Message ----- From: "Javier Iglesias" To: Cc: Sent: Wednesday, November 07, 2001 10:11 AM Subject: ExtremeNetworks Summit and channel bonding with Scyld > Hi all, > > We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster > for academic research in the field of genetic programming, and > large neural networks. > > We'd like to use an Extreme Networks Summit 48 ethernet switch > -> http://www.extremenetworks.com/products/datasheets/summit24.asp > connecting (highly recommended here recently :) Netgear FA310TX NICs > -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 > > Here come the questions : > * has anyone experienced channel bonding on that switch ? > * any Gigabit NIC recommandation for the master node ? > * is it possible/necessary to channel bond Gigabit interfaces ? > > Thanks in advance for your help !! > > --javier > > -- > Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus! > http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1 > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From Conrad.Geiger at Sun.COM Thu Nov 8 10:17:01 2001 From: Conrad.Geiger at Sun.COM (Conrad Geiger - Sun Academic Region HPC Technologist) Date: Wed Nov 25 01:01:52 2009 Subject: SGE - Grid Engine Training at SC2001, November 12 In-Reply-To: "Your message with ID" Message-ID: For those that are attending SC2001, there is a free Grid Engine (SGE) training session available for you. If you are interested in this open source Beowulf job management system and would like to attend, please email me and show up at the Denver location and time listed below: Class: SGE (Grid Engine) training Date: Monday, November 12 Time: 1:00 p.m. - 4:00 p.m. Classroom location: Colorado Ballroom F Marriott Hotel, 1701 California Street, Denver (near Denver Convention Center) AGENDA GRID ENGINE (SGE) TECHNICAL PRESENTATION: Sun Grid Engine (1 hour) -- overview of concepts -- installation options -- architecture -- information flow -- scheduling -- complexes and resource management -- parallel and checkpointing Examples (30 minutes) -- complexes -- load sensor -- license management -- immediate vs. low priority jobs SGE/EE technology (15 minutes) -- tickets -- share tree, functional, deadline, override Grid Engine Integration with ClusterTools (20 minutes) Grid Engine Open Source Project and API initiative (20 minutes) Conrad.Geiger@Sun.COM >----------------Begin Forwarded Message----------------< From: Ron Chen Subject: Re: Compile farm? To: Scott Thomason , Beowulf@beowulf.org Date: Fri, 2 Nov 2001 11:44:54 -0800 (PST) What you need is a batch system. There are 2 free batch systems, SGE and PBS. Both of them are opensource, but nevertheless, you can get 7x24 support if you are willing to pay. PBS: www.openpbs.com www.pbspro.com SGE: www.sun.com/gridware gridengine.sunsource.net Also, SGE has qmake, which can execute several instances of make on mutliple machines for one single make job. Install note: http://supportforum.sun.com/gridengine/appnote_install.html -Ron --- Scott Thomason wrote: > Greetings. I'm interested in setting up a shell > account/batch process/compile farm system for our > developers, and I'm wondering if Beowulf clusters > are well suited to that task. We're not interested > in writing parallel code using PVM or MPI, we just > want to log into what appears to be one big server > and have it dispatch the workload amongst the slave > processors. Is Beowulf good at that? > ---scott > > p.s. Sorry if there are duplicates of this message; > I used the wrong email address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >----------------End Forwarded Message----------------< From dabige1 at yahoo.com Fri Nov 9 09:20:10 2001 From: dabige1 at yahoo.com (Elie Bitton) Date: Wed Nov 25 01:01:52 2009 Subject: Starfire on RedHat 7.2 Message-ID: <008301c16942$c9fe2a80$0216a8c0@ebitton> Hi, I was trying to get my quad card (Adaptec ANA-62044) working under RedHat 7.2, and after RedHat did not automatically detect it, I found your site as pretty much the only source of information on this card. Anyway...a question.. When I do an insmod starfire with both the starfire.o that came with redhat 7.2 ( /lib/modules/2.4.7-10/kernel/drivers/net/starfire.o) and the one I compiled from your site (http://www.scyld.com) from the starfire.c (I compiled the pci-scan.c loaded pci-scan.o with insmod with no errors and I compiled both of these with the -I flag as mentioned on your site, but had to use slab.h instead of malloc.h Re the compiler's suggestion) it gives me the following error: "starfire.o: init_module: No such device Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters" I don't know what base IO address the card is using or IRQ. Is there a way to find out (without having to install any form of windows in another partition)? I also compiled your starfire-diag tool, but again, can't test the card if I don't know the IO address. Hoping you can help, Regards, Elie. I am not on the list...so please e-mail me back privately dabige1@yahoo.com Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011109/2bcad818/attachment.html From okeefe at sistina.com Sun Nov 11 10:45:26 2001 From: okeefe at sistina.com (Matt Okeefe) Date: Wed Nov 25 01:01:52 2009 Subject: White Paper on Storage Clustering for Beowulfs Message-ID: <20011111124526.A5938@sistina.com> Hi, there is a new white paper up on the Sistina web page entitled: Accelerating Technical Computing with Sistina's GFS Based Storage Clusters You can get this paper at our home page: www.sistina.com. It actually covers more than just technical computing, but that is the focus. Feedback and criticism are of course welcome. Matt O'Keefe CTO, Sistina Software, Inc. From yoon at bh.kyungpook.ac.kr Sun Nov 18 21:54:32 2001 From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho) Date: Wed Nov 25 01:01:52 2009 Subject: Pgi Atlas on Linux_Athlon References: <20011116181501.82208.qmail@web14310.mail.yahoo.com> Message-ID: <000a01c170be$b1c8ab60$5f72f2cb@LocalHost> I found your test matrix size is very small. (The matrix size N 1000 x 1000). That make you got not so good results with PGI-ATLAS on Linux Based ATHLON. It is not related to PGI compiler nor Athlon CPU or Network. Will you increase your matrix size to N 3000 x 3000 or 5000 x 5000 if you have enough Local Memory(RAM) ? Have a nice day ! Yoon Jae Ho, Seoul,Korea ----- Original Message ----- From: Eswar Dev To: Sent: Saturday, November 17, 2001 3:15 AM Subject: Pgi Atlas on Linux_Athlon > Hi! > I am getting bad results with PGI-ATLAS on LInux > Based ATHLON. > I need to get more speed up then what it shows hear. > Does any one had similar problems. Help needed > Thanks!!!! > -Eswarkumar > visit:http://atlantis.engr.odu.edu:8080 > ______________________________________________________ > > This is for ./xsl3blastst > ------------- GEMM ---------------------------------- > TST# A B M N K ALPHA LDA LDB BETA LDC > TIME MFLOP SpUp TEST > ==== = = ==== ==== ==== ===== ==== ==== ===== ==== > ===== ==== ===== > 0 N N 100 100 100 1.0 1000 1000 1.0 1000 > 0.00 0.0 1.00 ----- > 0 N N 100 100 100 1.0 1000 1000 1.0 1000 > 0.00 0.0 0.00 PASS > 1 N N 200 200 200 1.0 1000 1000 1.0 1000 > 0.06 266.7 1.00 ----- > 1 N N 200 200 200 1.0 1000 1000 1.0 1000 > 0.06 266.7 1.00 PASS > 2 N N 300 300 300 1.0 1000 1000 1.0 1000 > 0.28 192.9 1.00 ----- > 2 N N 300 300 300 1.0 1000 1000 1.0 1000 > 0.21 257.1 1.33 PASS > 3 N N 400 400 400 1.0 1000 1000 1.0 1000 > 0.91 140.7 1.00 ----- > 3 N N 400 400 400 1.0 1000 1000 1.0 1000 > 0.51 251.0 1.78 PASS > 4 N N 500 500 500 1.0 1000 1000 1.0 1000 > 2.07 120.8 1.00 ----- > 4 N N 500 500 500 1.0 1000 1000 1.0 1000 > 0.99 252.5 2.09 PASS > 5 N N 600 600 600 1.0 1000 1000 1.0 1000 > 3.85 112.2 1.00 ----- > 5 N N 600 600 600 1.0 1000 1000 1.0 1000 > 1.69 255.6 2.28 PASS > 6 N N 700 700 700 1.0 1000 1000 1.0 1000 > 6.26 109.6 1.00 ----- > 6 N N 700 700 700 1.0 1000 1000 1.0 1000 > 2.70 254.1 2.32 PASS > 7 N N 800 800 800 1.0 1000 1000 1.0 1000 > 9.44 108.5 1.00 ----- > 7 N N 800 800 800 1.0 1000 1000 1.0 1000 > 4.03 254.1 2.34 PASS > 8 N N 900 900 900 1.0 1000 1000 1.0 1000 > 13.47 108.2 1.00 ----- > 8 N N 900 900 900 1.0 1000 1000 1.0 1000 > 5.72 254.9 2.35 PASS > 9 N N 1000 1000 1000 1.0 1000 1000 1.0 1000 > 18.39 108.8 1.00 ----- > 9 N N 1000 1000 1000 1.0 1000 1000 1.0 1000 > 7.87 254.1 2.34 PASS > > 10 tests run, 10 passed > --------------------------------------------------------------------- Yoon Jae Ho Economist POSCO Research Institute yoon@bh.kyungpook.ac.kr jhyoon@mail.posri.re.kr http://ie.korea.ac.kr/~supercom/ Korea Beowulf Supercomputer http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D Help the people with your PC Imagination is more important than knowledge. A. Einstein "??????? ??? ???" ??? ??, " ??? ??? ??" ?? ??? ??(???? ???? ???? ??) "????? '???? ????? ??'??? ? ? ???, ??? ??? ????? ??? ????." ?? ??? "??? ?? ??? ??? ??? ??? ??? ??" ??? 2000.4.22 "???? ???? ?? ??? ??? ??? ????" ? ?? 2000.4.29 "???? ??? ??? ??? ??? ????" ? ?? 2000.4.24 http://www.kichun.co.kr 2001.1.6 http://www.c3tv.com 2001.1.10 ------------------------------------------------------------------------ From ssy at prg.cpe.ku.ac.th Mon Nov 19 00:05:31 2001 From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul) Date: Wed Nov 25 01:01:52 2009 Subject: About SCE - Re: Comparison of clustering systems? Message-ID: You can get the overview of SCE distribution at http://www.opensce.org/doc/compaq.pdf. Somsak ----- Original Message ----- From: "Peter Lindgren" To: Sent: Friday, November 16, 2001 2:54 AM Subject: Comparison of clustering systems? > Top500 has a poll asking "what cluster system do you use?". They offer the choices: > > Oscar > Scyld > Score > SCE > NPACI Rocks > MSC.Linux > Other > > Follow this link to see current results: > http://clusters.top500.org/pollbooth.php?qid=clustersys&aid=-1 > > So far I've actually installed and gotten my application running with Scyld. I have most of the other systems on CD. I was about to try Rocks. But to try them all is going to take me a long time.... > > I wonder whether anyone has recently done a comparison of these (or other) systems? (I also know of Cplant and IBM's CSK.) I found an NHSE article from 1996, but that's ancient history. > > > > Peter Lindgren > Phone: 847 944 4515 > Fax: 847 517 5889 > E-mail: peter.lindgren@experian.com > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- --------------------------------------------- Somsak Sriprayoonsakul Parallel Research Group http://prg.cpe.ku.ac.th --------------------------------------------- From SGaudet at turbotekcomputer.com Mon Nov 19 07:00:46 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:52 2009 Subject: Gigabit Ethernet switches and network adaptors. Message-ID: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO> Hello Joshua, > I am in the process of specifying components > for a small (8 node, Althon based) cluster. In > searching the net I came across a reasonably > priced 8 port copper Gigabit switch and > network adapters. > > I have a friend who says that the things that he > makes are inexpensive, not cheap. Does anyone > know whether the following are inexpensive, > or just cheap? Any experience with them? > > TrendNet TEG S80TX 8 port unmanaged switch - $799 > http://www.trendware.com/products/TEG-S80TX.htm > Price: > http://www.csocomputers.com/Hardware/Networking/Trendnet/Gigabit.htm > Review: http://www.8wire.com/articles/?aid=2300 > > TrendNet PCITX 32 bit PCI network adapter - $69 > http://www.trendware.com/products/TEG-PCITX.htm > > or the > > Accton EN1408T 32 Bit PCI network adapter - $99 > Review: http://www.8wire.com/articles/index.asp?AID=2212 > http://www.8wire.com/articles/index.asp?AID=2280 Here's some of the network GIG E hardware we'd like to recommend: AceNIC/NetGear GA620(T)/3C985B SysKonnect NS chipset: Cameo SOHO-GA2000T SOHO-GA2500T D-Link DGE-500T PureData PDP8023Z-TG SMC SMC9462TX NetGear GA622 ============================================================= For switches you might want to also look into top lines like HP, 3Com and Cisco...because they offer government, educational, and medical discounts(GEM). There discounts are very agressive. Cisco will also take trade-ins on old equipment. Cheers, Steve Gaudet Linux Sales Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From lmeerkat at yahoo.com Mon Nov 19 09:47:17 2001 From: lmeerkat at yahoo.com (L G) Date: Wed Nov 25 01:01:52 2009 Subject: Bootload problems. Message-ID: <20011119174717.34881.qmail@web20604.mail.yahoo.com> Hi, I'm building a Beowulf cluster and come across the problem on one of my machines. It is a PIII-133 with 10GB hard drive, 256 MB. It has following partitions: hda1 - 1 1 Linux hda2 - 2 1247 Extended hda5 - 2 1181 Linux hda6 -1182 1247 Linux swap. I'm trying to boot this node from a boot floppy disk. When a first step is completely done, then all of a sudden Linux boot starts to work. I can't figure out what's going on. I tried to boot it is with another order of partitions in a partition table, it was as follows: hda1 - 1 1 Linux hda2 - 2 1247 Extended hda5 - 2 67 Linux swap hda6 - 68 1247 Linux and I received another error which was "Cannot open root device 03:05", after that the system just started to reboot. The same kind of partition table I have on my another machine works ok. I tried to boot the node in question without any hard drive at all, but it didn't work either. It was still looking for root partition. Could you help me to solve this problem, please? Thanks. Lyudmila Gritsenko Software Developer Absoft Corp. ===== Best regards, Meerkat. __________________________________________________ Do You Yahoo!? Find the one for you at Yahoo! Personals http://personals.yahoo.com From kinghorn at pqs-chem.com Mon Nov 19 10:03:35 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:52 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing Message-ID: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> ... We'll it won't blow up with a kaboom ... According to the Mushkin web site you can use 2 non-registered modules on the tiger 2460 ... Now, I'll give my opinion. This board is poorly enginered and has been a pain make suitable for scientific computing. I've had numerous memory problems with these boards and most of the problems don't show up as obvious crashes. They are the worst kind of errors -- corrupted results for large jobs --- the kind of thing you might not catch without carefull testing. I have found that I can not reliably use more than 3 memory modules (of any type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB MODULES. Try 2 512MB or 2 256 and 1 512 module. I have a cluster running reliably with no detectable errors under any load (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), use a recent (>2.4.11) kernel with a stable vm, use an append noapic in lilo and do NOT use any mem= appends in lilo. All of these steps may not be simultaneously needed but I am sick of fighting with this motherboard and this configuration seems to work reliably. (your milage may vary) I'm looking forward to seeing some better dual athlon boards come to the market. Does any one have any info on when this may happen? Best regards -Don >[i'm off list so please reply to me directly aswell as the list] >Has any user of the Tiger MP S2460 had experience of what happens if you >DON'T >use registered memory? Will it blow up :-) ?? From Mark at MarkAndrewSmith.co.uk Mon Nov 19 10:11:55 2001 From: Mark at MarkAndrewSmith.co.uk (Mark@MarkAndrewSmith.co.uk) Date: Wed Nov 25 01:01:53 2009 Subject: Compile farm? Message-ID: <61DC272A66B8D211BA8200105ADF2D3910E6FF@SERVER01> Scott, .... or if you don't need batch but would like some interaction without the use of any special libraries, you might like to have a little look at what the MOSIX guys are doing at http://www.mosix.org/ or http://www.mosix.cs.huji.ac.il/ (I'm not sure but I think their website is down at the moment). I am currently building a dev box for general use and spreading compiles across nodes seems a good idea since you see the system as one big box. Then there is the added bonus you don't need to recompile any code....! Regards, Mark. Tel: (01942)722518 Mob: (07866)070122 -----Original Message----- From: Ron Chen [SMTP:ron_chen_123@yahoo.com] Sent: Monday 19 November 2001 07:00 To: Scott Thomason; Beowulf@beowulf.org Subject: Re: Compile farm? What you need is a batch system. There are 2 free batch systems, SGE and PBS. Both of them are opensource, but nevertheless, you can get 7x24 support if you are willing to pay. PBS: www.openpbs.com www.pbspro.com SGE: www.sun.com/gridware gridengine.sunsource.net Also, SGE has qmake, which can execute several instances of make on mutliple machines for one single make job. Install note: http://supportforum.sun.com/gridengine/appnote_install.html -Ron --- Scott Thomason wrote: > Greetings. I'm interested in setting up a shell > account/batch process/compile farm system for our > developers, and I'm wondering if Beowulf clusters > are well suited to that task. We're not interested > in writing parallel code using PVM or MPI, we just > want to log into what appears to be one big server > and have it dispatch the workload amongst the slave > processors. Is Beowulf good at that? > ---scott > > p.s. Sorry if there are duplicates of this message; > I used the wrong email address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011119/64e93fd1/attachment.html From math at velocet.ca Mon Nov 19 17:14:18 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO>; from SGaudet@turbotekcomputer.com on Mon, Nov 19, 2001 at 10:00:46AM -0500 References: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO> Message-ID: <20011119201418.L66460@velocet.ca> On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all... > Here's some of the network GIG E hardware we'd like to recommend: > > AceNIC/NetGear GA620(T)/3C985B > SysKonnect > NS chipset: > Cameo SOHO-GA2000T SOHO-GA2500T > D-Link DGE-500T > PureData PDP8023Z-TG > SMC SMC9462TX > NetGear GA622 How do you find the performance of these NS82830 cards? Do they do block interupt xfer or whatever it is for more efficient xfer? How much system/interupt time do they chew up? /kc From wsb at paralleldata.com Mon Nov 19 19:28:01 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> Message-ID: <3BF9CDC1.EC92838A@paralleldata.com> I must have gotten lucky. I built a test system, installed RH7.2 on it, and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's why I didn't have so much trouble. Also tried MP vs XP processors and both perform identically with this board. It's the fastest Athlon system I've tested. Wes "Donald B. Kinghorn" wrote: > > ... > We'll it won't blow up with a kaboom ... According to the Mushkin web site > you can use 2 non-registered modules on the tiger 2460 ... > > Now, I'll give my opinion. This board is poorly enginered and has been a > pain make suitable for scientific computing. I've had numerous memory > problems with these boards and most of the problems don't show up as obvious > crashes. They are the worst kind of errors -- corrupted results for large > jobs --- the kind of thing you might not catch without carefull testing. I > have found that I can not reliably use more than 3 memory modules (of any > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB > MODULES. Try 2 512MB or 2 256 and 1 512 module. > > I have a cluster running reliably with no detectable errors under any load > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), > use a recent (>2.4.11) kernel with a stable vm, use an append noapic in > lilo and do NOT use any mem= appends in lilo. All of these steps may not be > simultaneously needed but I am sick of fighting with this motherboard and > this configuration seems to work reliably. (your milage may vary) > > I'm looking forward to seeing some better dual athlon boards come to the > market. Does any one have any info on when this may happen? > > Best regards > -Don > > >[i'm off list so please reply to me directly aswell as the list] > > >Has any user of the Tiger MP S2460 had experience of what happens if you > >DON'T > >use registered memory? Will it blow up :-) ?? > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at conservativecomputer.com Mon Nov 19 20:17:48 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing In-Reply-To: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net>; from kinghorn@pqs-chem.com on Mon, Nov 19, 2001 at 12:03:35PM -0600 References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> Message-ID: <20011119231748.A2068@wumpus.foo> On Mon, Nov 19, 2001 at 12:03:35PM -0600, Donald B. Kinghorn wrote: > I have a cluster running reliably with no detectable errors under any load > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. One thing that can kill you with memory is mixing dissimilar memory. greg From math at velocet.ca Mon Nov 19 20:20:17 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing In-Reply-To: <3BF9CDC1.EC92838A@paralleldata.com>; from wsb@paralleldata.com on Mon, Nov 19, 2001 at 09:28:01PM -0600 References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com> Message-ID: <20011119232016.C89961@velocet.ca> On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all... > > I must have gotten lucky. I built a test system, installed RH7.2 on it, > and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's > why I didn't have so much trouble. Also tried MP vs XP processors and both > perform identically with this board. It's the fastest Athlon system I've tested. > > Wes Have you tested your computations for subtle errors on it, compared to with the MP processors? As said below there can be errors that are undetectable - how likely is this? /kc > > "Donald B. Kinghorn" wrote: > > > > ... > > We'll it won't blow up with a kaboom ... According to the Mushkin web site > > you can use 2 non-registered modules on the tiger 2460 ... > > > > Now, I'll give my opinion. This board is poorly enginered and has been a > > pain make suitable for scientific computing. I've had numerous memory > > problems with these boards and most of the problems don't show up as obvious > > crashes. They are the worst kind of errors -- corrupted results for large > > jobs --- the kind of thing you might not catch without carefull testing. I > > have found that I can not reliably use more than 3 memory modules (of any > > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB > > MODULES. Try 2 512MB or 2 256 and 1 512 module. > > > > I have a cluster running reliably with no detectable errors under any load > > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg > > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You > > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), > > use a recent (>2.4.11) kernel with a stable vm, use an append noapic in > > lilo and do NOT use any mem= appends in lilo. All of these steps may not be > > simultaneously needed but I am sick of fighting with this motherboard and > > this configuration seems to work reliably. (your milage may vary) > > > > I'm looking forward to seeing some better dual athlon boards come to the > > market. Does any one have any info on when this may happen? > > > > Best regards > > -Don > > > > >[i'm off list so please reply to me directly aswell as the list] > > > > >Has any user of the Tiger MP S2460 had experience of what happens if you > > >DON'T > > >use registered memory? Will it blow up :-) ?? > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From wsb at paralleldata.com Mon Nov 19 20:39:07 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com> <20011119232016.C89961@velocet.ca> Message-ID: <3BF9DE6B.E05B29A1@paralleldata.com> Well, I always scan my results for the min/max values and so far they've been identical. Haven't actually subtracted the results to see if it changed. I'm picking up a couple more boards and should have them running next week. I can do more testing at that point. It's a pain to swap cpus at the moment. Are you sure you don't have a heat problem? What sort of case are you using? Maybe you're heatsink/fan are insufficient for your particular chip? There are some very high volume fan/HS combos available for Athlons. I'm using a 5400rpm fan/HS at the moment but have a couple 8000rpm fan/HS available if needed. They're awful noisy though and are a last resort. Wes Velocet wrote: > > On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all... > > > > I must have gotten lucky. I built a test system, installed RH7.2 on it, > > and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's > > why I didn't have so much trouble. Also tried MP vs XP processors and both > > perform identically with this board. It's the fastest Athlon system I've tested. > > > > Wes > > Have you tested your computations for subtle errors on it, compared to > with the MP processors? As said below there can be errors that > are undetectable - how likely is this? > > /kc > > > > > "Donald B. Kinghorn" wrote: > > > > > > ... > > > We'll it won't blow up with a kaboom ... According to the Mushkin web site > > > you can use 2 non-registered modules on the tiger 2460 ... > > > > > > Now, I'll give my opinion. This board is poorly enginered and has been a > > > pain make suitable for scientific computing. I've had numerous memory > > > problems with these boards and most of the problems don't show up as obvious > > > crashes. They are the worst kind of errors -- corrupted results for large > > > jobs --- the kind of thing you might not catch without carefull testing. I > > > have found that I can not reliably use more than 3 memory modules (of any > > > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB > > > MODULES. Try 2 512MB or 2 256 and 1 512 module. > > > > > > I have a cluster running reliably with no detectable errors under any load > > > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg > > > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You > > > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), > > > use a recent (>2.4.11) kernel with a stable vm, use an append noapic in > > > lilo and do NOT use any mem= appends in lilo. All of these steps may not be > > > simultaneously needed but I am sick of fighting with this motherboard and > > > this configuration seems to work reliably. (your milage may vary) > > > > > > I'm looking forward to seeing some better dual athlon boards come to the > > > market. Does any one have any info on when this may happen? > > > > > > Best regards > > > -Don > > > > > > >[i'm off list so please reply to me directly aswell as the list] > > > > > > >Has any user of the Tiger MP S2460 had experience of what happens if you > > > >DON'T > > > >use registered memory? Will it blow up :-) ?? > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- > Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From math at velocet.ca Mon Nov 19 21:12:11 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing In-Reply-To: <3BF9DE6B.E05B29A1@paralleldata.com>; from wsb@paralleldata.com on Mon, Nov 19, 2001 at 10:39:07PM -0600 References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com> <20011119232016.C89961@velocet.ca> <3BF9DE6B.E05B29A1@paralleldata.com> Message-ID: <20011120001211.D89961@velocet.ca> On Mon, Nov 19, 2001 at 10:39:07PM -0600, W Bauske's all... > > Well, I always scan my results for the min/max values and so far > they've been identical. Haven't actually subtracted the results > to see if it changed. I'm picking up a couple more boards and > should have them running next week. I can do more testing at that > point. It's a pain to swap cpus at the moment. > > Are you sure you don't have a heat problem? What sort of case are > you using? Maybe you're heatsink/fan are insufficient for your > particular chip? There are some very high volume fan/HS combos > available for Athlons. I'm using a 5400rpm fan/HS at the moment but > have a couple 8000rpm fan/HS available if needed. They're awful > noisy though and are a last resort. I've had no problems personally. I was just trying to get a bead on what kind of problems others have had. So far for you and me they've worked perfectly, but I spent the extra $5 and got Registered ECC DDR ram (from Crucial its a great price, why not). They do generate alot of heat (2 cpus) and I have 5400 RPM fans but they're in a cabinet with alot of airflow (not sure how many CFM, but alot of it, at 66F). /kc > > Wes > > Velocet wrote: > > > > On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all... > > > > > > I must have gotten lucky. I built a test system, installed RH7.2 on it, > > > and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's > > > why I didn't have so much trouble. Also tried MP vs XP processors and both > > > perform identically with this board. It's the fastest Athlon system I've tested. > > > > > > Wes > > > > Have you tested your computations for subtle errors on it, compared to > > with the MP processors? As said below there can be errors that > > are undetectable - how likely is this? > > > > /kc > > > > > > > > "Donald B. Kinghorn" wrote: > > > > > > > > ... > > > > We'll it won't blow up with a kaboom ... According to the Mushkin web site > > > > you can use 2 non-registered modules on the tiger 2460 ... > > > > > > > > Now, I'll give my opinion. This board is poorly enginered and has been a > > > > pain make suitable for scientific computing. I've had numerous memory > > > > problems with these boards and most of the problems don't show up as obvious > > > > crashes. They are the worst kind of errors -- corrupted results for large > > > > jobs --- the kind of thing you might not catch without carefull testing. I > > > > have found that I can not reliably use more than 3 memory modules (of any > > > > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB > > > > MODULES. Try 2 512MB or 2 256 and 1 512 module. > > > > > > > > I have a cluster running reliably with no detectable errors under any load > > > > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg > > > > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You > > > > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), > > > > use a recent (>2.4.11) kernel with a stable vm, use an append noapic in > > > > lilo and do NOT use any mem= appends in lilo. All of these steps may not be > > > > simultaneously needed but I am sick of fighting with this motherboard and > > > > this configuration seems to work reliably. (your milage may vary) > > > > > > > > I'm looking forward to seeing some better dual athlon boards come to the > > > > market. Does any one have any info on when this may happen? > > > > > > > > Best regards > > > > -Don > > > > > > > > >[i'm off list so please reply to me directly aswell as the list] > > > > > > > > >Has any user of the Tiger MP S2460 had experience of what happens if you > > > > >DON'T > > > > >use registered memory? Will it blow up :-) ?? > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > > Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From jakob at unthought.net Mon Nov 19 22:03:39 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:53 2009 Subject: [ot] Re: AMD testing In-Reply-To: <20011101222847.1136f542.zadok@phreaker.net>; from zadok@phreaker.net on Thu, Nov 01, 2001 at 10:28:47PM +0000 References: <20011101222847.1136f542.zadok@phreaker.net> Message-ID: <20011120070339.M9896@unthought.net> On Thu, Nov 01, 2001 at 10:28:47PM +0000, Hereward Cooper wrote: > [i'm off list so please reply to me directly aswell as the list] > > Has any user of the Tiger MP S2460 had experience of what happens if you DON'T > use registered memory? Will it blow up :-) ?? It should work if you only use two memory modules (max). If you need more than two modules, you need registered memory (for all modules). Apparently the chipset can't drive more than two unregistered blocks. However, on the Tiger here I couldn't use two unregistered modules at all. The board made some sequence of "beeps" at power-on, wouldn't even POST. Using registered modules instead solved the problem. (Mmmmm.... 2x2785 BogoMIPS) Mine didn't blow up. But that's hardly proof that it can't happen, so let us know if yours blow up 8) -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From mcosta at fc.up.pt Tue Nov 20 03:04:16 2001 From: mcosta at fc.up.pt (Miguel Costa) Date: Wed Nov 25 01:01:53 2009 Subject: MPE parallel graphics Message-ID: <3BFA38B0.3080406@fc.up.pt> Hello again, after finding enlightenment (no pun intended) on bpsh on my first post, I return to seek your help on a different topic: On Scyld, when I use MPICH's mpicc with the flag -mpianim and then run the program, it displays a window with dots representing the cpus but then the nodes don't seem to be able to communicate with the master's X server and it crashes. Anyone had this problem or can see why this is happening? Thanks again regards, miguel costa From SGaudet at turbotekcomputer.com Tue Nov 20 08:06:29 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. Message-ID: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> Hello, > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all... > > Here's some of the network GIG E hardware we'd like to recommend: > > > > AceNIC/NetGear GA620(T)/3C985B > > SysKonnect > > NS chipset: > > Cameo SOHO-GA2000T SOHO-GA2500T > > D-Link DGE-500T > > PureData PDP8023Z-TG > > SMC SMC9462TX > > NetGear GA622 > > > How do you find the performance of these NS82830 cards? Do they do > block interupt xfer or whatever it is for more efficient > xfer? How much > system/interupt time do they chew up? The NS based stuff is low-end and cheap - you get what you pay for - but the drivers are rock solid and provide a good way to get started with GigE. On a high enough powered box, you might even get decent throughput but it's at a cost of cycles. The SysKonnect card I'm told, is going EOL so good news/bad news, they may start showing up on E-bay sometime. ;) The most promising stuff is a broadcomm chip based 3com card (3c996) but as of this moment, there's no Linux driver for it. That'll be the stuff to buy for next year, though. Cheers, Steve Gaudet Linux Sales Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From ssy at prg.cpe.ku.ac.th Tue Nov 20 08:43:04 2001 From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul) Date: Wed Nov 25 01:01:53 2009 Subject: Compile farm? Message-ID: There is a program called 'ppmake' which uses a combination of make -j and PVM to distribute compilation thread to each node in cluster. You can looking for it at rpmfind.net or google.com (It used to be at http://www3.informatik.tu-muenchen.de/~zimmerms/ppmake/ but the link is down). You might use it with some Batch Scheduling Systems that support PVM. ------------------------------------------------------------------- Somsak Sriprayoonsakul Parallel Research Group Kasetsart University ssy@prg.cpe.ku.ac.th ------------------------------------------------------------------- From math at velocet.ca Tue Nov 20 08:46:02 2001 From: math at velocet.ca ('math@velocet.ca') Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>; from SGaudet@turbotekcomputer.com on Tue, Nov 20, 2001 at 11:06:29AM -0500 References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> Message-ID: <20011120114601.K89961@velocet.ca> > The NS based stuff is low-end and cheap - you get what you pay for - but the > drivers are rock solid and provide a good way to get started with GigE. > > On a high enough powered box, you might even get decent throughput but it's > at a cost of cycles. Here's an interesting question: even with the fastest network interconnects (SCALI, etc), we dont see 100% scaling at large numbers of nodes. There is some free CPU left over. So what if you had slightly less efficient equipment? I realise it would cause a slowdown for sending out messages as the latency may be increased, but if the latency is the same as for the high end network equipment and only costs more cycles, is it conceivable that the scaling and performance of this cluster with slightly less efficient equipment would be similar? (Again, there's a big assumption here that we can find such equipment that has the same latency when extra cycles are involved, which may be the source of much latency for many cards in the first place). /kc From lmeerkat at yahoo.com Tue Nov 20 09:28:53 2001 From: lmeerkat at yahoo.com (L G) Date: Wed Nov 25 01:01:53 2009 Subject: Beofulf Status Monitor information Message-ID: <20011120172853.96241.qmail@web20606.mail.yahoo.com> Hi, I can't see any information in the Scyld Beofulf Status Monitor for one of my node, it says Memory - 0%, Swap - None, Disk - 0%, Network - 0 kBps. It is Pentium with 64Mb and 2GB HD. This node is in the state "up" and I do have full access to this node from master machine. I think the problem is that it only has 64Mb of RAM, am I right? Thanks. Lyudmila Gritsenko ===== __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From Nirmal.Bissonauth at durham.ac.uk Tue Nov 20 09:38:16 2001 From: Nirmal.Bissonauth at durham.ac.uk (Nirmal Bissonauth) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> Message-ID: On Tue, 20 Nov 2001, Steve Gaudet wrote: > The most promising stuff is a broadcomm chip based 3com card (3c996) but as > of > this moment, there's no Linux driver for it. That'll be the stuff to buy > for > next year, though. > > > Cheers, > > > Steve Gaudet > Linux Sales Engineer > ..... Actually 3com has some linux drivers for the 3c996 card on their website. If anybody tries them, let us know if they are any good? http://www.3com.com/products/en_US/result.jsp?selected=6&sort=effdt&sku=3C996-T&order=desc Hope that was pasted ok. Cheers Nirmal Bissonauth From okeefe at sistina.com Tue Nov 20 10:46:57 2001 From: okeefe at sistina.com (Matt Okeefe) Date: Wed Nov 25 01:01:53 2009 Subject: Compile farm? In-Reply-To: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com> References: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com> Message-ID: <20011120124657.A6703@sistina.com> On Wed, Nov 14, 2001 at 08:58:26AM -0500, Strange, John wrote: > Well you can use mexec with mosix to get things to work, and it does work > quite well but it doesn't scale because of some underlying filesystem > problems we are having. > > I've got 25 machines, our backend storage currently is netapp filers, so > using NFS I have to turn off client side caching. It basically crushes the > filer doing constant file handling lookups. I'm still playing with a netapp > that we have on spare, maybe I'll have some luck with finding away around > the problems that we are having. > > There is no really good backend filesystem that you can use, maybe GFS but > it's still relatively new and too bleeding edge for pratical use. (IMHO) Actually there are a fair number of people using it in production, in some cases for nearly a year, I can give you references if you like. None have complained of data corruption due to GFS. > Plus we don't have the hardware for it fiber channel and we have *NO* > budget. The next release of GFS will include an improved shared IP network block driver, called GNBD. You can run it over Ethernet or Myrinet, or whatever network you have. Matt O'Keefe Sistina Software, Inc. > > If anyone has any suggestions I would glad to hear them. > > Thanks, > > John Strange > Marconi > john.ws.strange.at.marconi.com > > -----Original Message----- > From: Scott Thomason [mailto:SThomaso@phmining.com] > Sent: Friday, November 02, 2001 2:25 PM > To: Beowulf@beowulf.org > Subject: Compile farm? > > > Greetings. I'm interested in setting up a shell account/batch > process/compile farm system for our developers, and I'm wondering if Beowulf > clusters are well suited to that task. We're not interested in writing > parallel code using PVM or MPI, we just want to log into what appears to be > one big server and have it dispatch the workload amongst the slave > processors. Is Beowulf good at that? > ---scott > > p.s. Sorry if there are duplicates of this message; I used the wrong email > address earlier. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From wsb at paralleldata.com Tue Nov 20 12:55:07 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> Message-ID: <3BFAC32B.4282F6D6@paralleldata.com> Steve Gaudet wrote: > > Hello, > > > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all... > > > Here's some of the network GIG E hardware we'd like to recommend: > > > > > > AceNIC/NetGear GA620(T)/3C985B > > > SysKonnect > > > NS chipset: > > > Cameo SOHO-GA2000T SOHO-GA2500T > > > D-Link DGE-500T > > > PureData PDP8023Z-TG > > > SMC SMC9462TX > > > NetGear GA622 > > > > > > How do you find the performance of these NS82830 cards? Do they do > > block interupt xfer or whatever it is for more efficient > > xfer? How much > > system/interupt time do they chew up? > > The NS based stuff is low-end and cheap - you get what you pay for - but the > drivers are rock solid and provide a good way to get started with GigE. > > On a high enough powered box, you might even get decent throughput but it's > at a > cost of cycles. > If you consider 57MB/sec for a $45 card bad, then the ns83820's are a bad deal. I consider that a good buy. CPU is around 30% of a P4 1.5ghz system, both sending and receiving sides. It would be nice if the cpu was lower but it's acceptable for what I do. (YMMV) Also would be nice if Gbe switches were cheaper. Latest pricing I see is $600 for an 8 port switch, or $75/port+$45/card for $120 per connection. Wes From math at velocet.ca Tue Nov 20 13:25:51 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. In-Reply-To: <3BFAC32B.4282F6D6@paralleldata.com>; from wsb@paralleldata.com on Tue, Nov 20, 2001 at 02:55:07PM -0600 References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> <3BFAC32B.4282F6D6@paralleldata.com> Message-ID: <20011120162551.Q89961@velocet.ca> On Tue, Nov 20, 2001 at 02:55:07PM -0600, W Bauske's all... > Steve Gaudet wrote: > > > > Hello, > > > > > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all... > > > > Here's some of the network GIG E hardware we'd like to recommend: > > > > > > > > AceNIC/NetGear GA620(T)/3C985B > > > > SysKonnect > > > > NS chipset: > > > > Cameo SOHO-GA2000T SOHO-GA2500T > > > > D-Link DGE-500T > > > > PureData PDP8023Z-TG > > > > SMC SMC9462TX > > > > NetGear GA622 > > > > > > > > > How do you find the performance of these NS82830 cards? Do they do > > > block interupt xfer or whatever it is for more efficient > > > xfer? How much > > > system/interupt time do they chew up? > > > > The NS based stuff is low-end and cheap - you get what you pay for - but the > > drivers are rock solid and provide a good way to get started with GigE. > > > > On a high enough powered box, you might even get decent throughput but it's > > at a > > cost of cycles. > > > > If you consider 57MB/sec for a $45 card bad, then the ns83820's > are a bad deal. I consider that a good buy. CPU is around > 30% of a P4 1.5ghz system, both sending and receiving sides. I dont know what CPU was in use at the time, but I've gotten over 350Mbps out of them with one machine: dd if=/dev/zero bs=1M count=1000 | nc -w 1 othermachine 33333 othermachine: nc -w 1 -l -p 33333 | dd of=/dev/null (dd on freebsd tells Bps on stderr on termination or interupt) > It would be nice if the cpu was lower but it's acceptable for what > I do. (YMMV) With gromacs going between 2 dual-athlon 1.33Ghz CPUs running the d.dppc benchmark I only noticed about 2-3% system time and 87% usertime. (Linux doesnt seperate interupt and system time like freebsd does, and I was using linux in this test). > Also would be nice if Gbe switches were cheaper. Latest pricing I > see is $600 for an 8 port switch, or $75/port+$45/card for $120 > per connection. Why not just go for SCALI? Since when does COST matter on the Beowulf list? :) /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From rock16905 at yahoo.com Tue Nov 20 17:25:31 2001 From: rock16905 at yahoo.com (xinhuang zhang) Date: Wed Nov 25 01:01:53 2009 Subject: (no subject) Message-ID: <20011121012531.86532.qmail@web20808.mail.yahoo.com> Hollo; After installing mpich and doing the test, I got the following error message. bw-05 is the host and bw-04 is one of nodes. I hope someone can help me sovle this problem. Thanks a lot! F. rock [biocompu@bw-05 examples]$ cd test [biocompu@bw-05 test]$ make testing (cd pt2pt ; ./runtests -check ) Failed to run simple program! Output from run attempt was *** Testing Unexpected messages *** bash: /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third: No such file or directory p0_1290: p4_error: Timeout in making connection to remote process on bw-04: 0 Connection failed for reason: : Connection refused Connection failed for reason: : Connection refused /home/biocompu/mpich-1.2.2.3/bin/mpirun: line 1: 1290 Broken pipe /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third -p4pg /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/PI1210 -p4wd /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt *** Testing Unexpected messages *** mpirun program was /home/biocompu/mpich-1.2.2.3/bin/mpirun mpirun command was /home/biocompu/mpich-1.2.2.3/bin/mpirun -mvhome -np 2 ./third >third.out 2>&1 make: *** [runtest] Error 1 [biocompu@bw-05 test]$ __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From rock16905 at yahoo.com Tue Nov 20 17:29:00 2001 From: rock16905 at yahoo.com (xinhuang zhang) Date: Wed Nov 25 01:01:53 2009 Subject: need help for installation! Message-ID: <20011121012900.16531.qmail@web20802.mail.yahoo.com> Hollo; After installing mpich and doing the test, I got the following error message. bw-05 is the host and bw-04 is one of nodes. I hope someone can help me sovle this problem. Thanks a lot! F. rock [biocompu@bw-05 examples]$ cd test [biocompu@bw-05 test]$ make testing (cd pt2pt ; ./runtests -check ) Failed to run simple program! Output from run attempt was *** Testing Unexpected messages *** bash: /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third: No such file or directory p0_1290: p4_error: Timeout in making connection to remote process on bw-04: 0 Connection failed for reason: : Connection refused Connection failed for reason: : Connection refused /home/biocompu/mpich-1.2.2.3/bin/mpirun: line 1: 1290 Broken pipe /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third -p4pg /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/PI1210 -p4wd /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt *** Testing Unexpected messages *** mpirun program was /home/biocompu/mpich-1.2.2.3/bin/mpirun mpirun command was /home/biocompu/mpich-1.2.2.3/bin/mpirun -mvhome -np 2 ./third >third.out 2>&1 make: *** [runtest] Error 1 [biocompu@bw-05 test]$ __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From Eugene.Leitl at lrz.uni-muenchen.de Wed Nov 21 04:32:49 2001 From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl) Date: Wed Nov 25 01:01:53 2009 Subject: LINUX PC CLUSTER AND SEISMIC MIGRATION Message-ID: Sorry for forwarding this late, I'm processing a large backlog. ---------- Forwarded message ---------- Date: Sun, 11 Nov 2001 18:21:41 -0600 From: Roberto Cervantes Muller To: linuxbios@lanl.gov Subject: info Dear Sir: LINUX PC CLUSTER AND SEISMIC MIGRATION If we run migration software package in a PC Cluster environment, we distribute all task among the clusters and also the hard space available on each cluster. What would happened if one of the clusters go down while performing a pre stack depth migration? that might takes months, do we have to re-start the process , else what would happened with that data? PROBABLY WILL DEPEND ON COMMUNICATION SOFTWARE: It depends on the software. The package I am most familiar with, MPICH, cannot recover from a failed node and the entire parallel process must be restarted. I don't know if there are any generic software systems that can handle dynamic changes to the cluster. Otherwise, it may be possible to adjust your application so it will checkpoint itself at stages of the computation so it can be completely restarted at a checkpoint after a node fails. thanks and regards, ________________________________________ Roberto Cervantes Muller Technical Manager Tesenergy Services E-mail: robertoc@tesenergy.net E-fax: 1 509 696 8501 URL=http://www.tesenergy.net ________________________________________ ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify me. This footnote also confirms that this email message has been swept by for the presence of computer viruses. ********************************************************************** From kinghorn at pqs-chem.com Wed Nov 21 08:59:07 2001 From: kinghorn at pqs-chem.com (Donald B. Kinghorn) Date: Wed Nov 25 01:01:53 2009 Subject: Tiger MP S2460 was [ot] Re: AMD testing Message-ID: <0GN5005RCTTOGB@mta4.rcsntx.swbell.net> ... A couple more notes ... The memory I am using: I bought a couple hundred Crucial 256MB reg ecc pc2100 modules so I'm using 2 of those on each board and I've added 1 Infenion 64x4 512MB reg ecc module to each board ( I'm not sure if the Infenion based modules are made by Infenion or someone else ... I got them from MicroPro, for $109 but they have gone up to ~$135 ) Greg Lindahl mentioned that it's not good to mix modules and I agree but what I've got in these boxes seems to working OK. Kernel: I'm using a stock Mandrake 2.4.13 kernel build ... the AMD7411 only gets set to ata33 (DMA mode 2 I believe). I have a bunch of machines that I've forced to ata100 by passing ide0=ata66 on the lilo append line this forces the controller into DMA mode 5 (ata100) It seems to be stable but it's too early to say for sure. I've seen kernel patches that I think will detect and setup the controller correctly but I haven't tried them. The patch (fix) may be in the 2.4.14/15 source but I'm not sure. (?) Also, I just read a review of new motherboards shown at the COMDEX show ... there are a bunch of new dual athlon boards in the mix ... so it looks like we'll have more choices soon. Hurray! http://www.anandtech.com/mb/showdoc.html?i=1560&p=1 Best regards -Don Dr. Donald B. Kinghorn Parallel Quantum Solutions LLC http://www.pqs-chem.com From jharrop at shaw.ca Thu Nov 22 10:01:03 2001 From: jharrop at shaw.ca (4j harrop) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers Message-ID: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus> Hi, I've been a lurker on this list for some time. The conversations here have been most helpful while I've been working on getting up to speed. I have recently built a small beowulf cluster and am now looking at getting a FORTRAN90 compiler. Can anyone on the list recommend which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ? If you have negative comments that you would rather not publish to the list, please contact me directly at jharrop@shaw.ca Thanks in advance! John Harrop Adapt Systems Corp Cyberquest Geoscience Ltd From ron_chen_123 at yahoo.com Thu Nov 22 17:55:00 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:53 2009 Subject: [PBS-USERS] big cluster In-Reply-To: Message-ID: <20011123015500.41676.qmail@web14703.mail.yahoo.com> "... NCSA runs Maui on 512 node PBS Linux cluster. " See: http://www.supercluster.org/main.html You may want to apply the scaling patch so that PBS can scale beyond 500 hosts: See: http://www-unix.mcs.anl.gov/openpbs/ I've heard many rumers about SGE and PBS. Looks like there is a company spreading the rumers: http://supportforum.sun.com/cgi-bin/WebX.cgi?13@217.dvcxaQuMfpL^0@.ee8e727 Or if you can't get the page, follow: http://www.sun.com/software/gridware/support.html Technical Forums -> Compute Farms -> some comments overheard by Platform Computing rep. -Ron --- Tamar Domany wrote: > > I heard a rumor that PBS has a scalability problem > when working with more > then a 200 compute nodes. > Is that true ? > Does any one has a experience ( good or bad ) with > cluster that size or > bigger ? > > Thanks > Tamar > > __________________________________________________________________________ > To unsubscribe: email majordomo@openpbs.org with > body "unsubscribe pbs-users" > For message archives: visit > http://openpbs.org/UserArea/pbs-users.html > - - - - - - - - - - > - - - - > Academic Site? Use PBS Pro free, see: > http://www.pbspro.com/academia.html > OpenPBS and the pbs-users mailing list are sponsored > by Veridian. > __________________________________________________________________________ __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From yoon at bh.kyungpook.ac.kr Thu Nov 22 23:22:35 2001 From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho) Date: Wed Nov 25 01:01:53 2009 Subject: 32 bit vs 64 bit computer ? Message-ID: <001d01c173ef$a16ae600$5f72f2cb@LocalHost> I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit computer. and Why we can't make 128 bit computer for long time ? I don't know how much(the maximum number) the 32 bit computer vs 64 bit makes exact calculation without error. and With different architecture PCs - for example AMD, Intel, MAC cpu , Is it possible to communate the calculatiton results each other ? and With same os - for example LINUX, Is it possble to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ? I mean we can communicate the calculation results with each other( 32 bit vs 64 bit) during caluculation with same O.S ? Thank you very much --------------------------------------------------------------------- Yoon Jae Ho Economist POSCO Research Institute yoon@bh.kyungpook.ac.kr jhyoon@mail.posri.re.kr http://ie.korea.ac.kr/~supercom/ Korea Beowulf Supercomputer http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D Help the people with your PC Imagination is more important than knowledge. A. Einstein http://www.kichun.co.kr 2001.1.6 http://www.c3tv.com 2001.1.10 ------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/ba5cf820/attachment.html From jakob at unthought.net Fri Nov 23 00:57:33 2001 From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=) Date: Wed Nov 25 01:01:53 2009 Subject: 32 bit vs 64 bit computer ? In-Reply-To: <001d01c173ef$a16ae600$5f72f2cb@LocalHost>; from yoon@bh.kyungpook.ac.kr on Fri, Nov 23, 2001 at 04:22:35PM +0900 References: <001d01c173ef$a16ae600$5f72f2cb@LocalHost> Message-ID: <20011123095733.A9896@unthought.net> On Fri, Nov 23, 2001 at 04:22:35PM +0900, Yoon Jae Ho wrote: > > I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit computer. > > and Why we can't make 128 bit computer for long time ? *Usually* these bits refer to the addressing capability of the machine. A 32-bit machine can address a 32-bit memory space, meaning, 2^32 bytes, or 4 GB. Now, current 32-bit Intel machines actually contain some hacks so that the CPU can address more than 32-bits. One process can still only address a 32-bit space though (yes, I know you can do windowing/mmap hacks to sort-of address more, but the process will still live in one 32-bit address space). A 64-bit machine can address a 64-bit memory space. I suppose that's around 16 exabytes or something like that. It's the rediculous amount of ~ 10^19 bytes. Now, a 128 bit machine would address around 10^38 bytes. There's something like 10^86 elementary particles in the known parts of the universe - building a machine with an actual 128 bit physical address space is going to be challenging with today's technology, to say the least :) > > I don't know how much(the maximum number) the 32 bit computer vs 64 bit makes exact calculation without error. If you use floating point, you usually use "float" or "double" types. Those have been 32-bits (float) and 64-bits (double) on all 32-bit and 64-bit systems regardless, forever. It's an IEEE standard. > > and With different architecture PCs - for example AMD, Intel, MAC cpu , Is it possible to communate the calculatiton results each other ? Communication happens with a protocol. If you protocol is standardised among platforms, you can. If you didn't make your protocol to work between different machines, you can't. > > and With same os - for example LINUX, Is it possble to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ? Sure, it's possible. Now, many parallel application will either use a protocol that is not "safe" between different architectures, or the application will depend on special numerical properties of specific architectures. Mixing architectures can give some headaches there. But then again, it would be trivial to make sure that parallel jobs only execute on one particular architecture. Whether it's desirable to mix architectures depends entirely on what kind of applications you are planning to run. Diversity can be as useful as it can be painful. It all depends... > > I mean we can communicate the calculation results with each other( 32 bit vs 64 bit) during caluculation with same O.S ? Again, communication happens over a protocol. If your protocol can make it work, it will work. If your protocol cannot make it work, it cannot work - operating systems do not matter here. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob ?stergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: From bdorland at kendall.umd.edu Fri Nov 23 01:16:03 2001 From: bdorland at kendall.umd.edu (Bill Dorland) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers In-Reply-To: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus> (message from 4j harrop on Thu, 22 Nov 2001 10:01:03 -0800) References: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus> Message-ID: <200111230916.fAN9G3W30351@kendall.umd.edu> > I have recently built a small beowulf cluster and am now looking at > getting a FORTRAN90 compiler. Can anyone on the list recommend > which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ? I've tested three Fortran 90 compilers in this basic environment, on a suite of scientific codes. They are the Portland Group's f90, NAG f95, and Lahey/Fujitsu's lf95. I also tried the Portland Group HPF compiler. I have found the Portland Group products to be heavily bug-ridden, and essentially unusable by a group of scientists that are actively developing code that uses Fortran 90 (or HPF) features. Moreover, carefully constructed bug reports submitted to the company failed to stir them. I strongly advise avoiding this company. My colleagues at an American national laboratory independently came to the same conclusions, based on their problems with the PG products. The other two compilers, on the other hand, are both very good. My colleagues and I are fully satisfied with the performance and compatibility with the Fortran 90/95 standards of both. I expect that either would perform well for you. I haven't tried Absoft's f90 compiler, but I will do so next week. Let me know if you are interested in the results. --Bill From Daniel.Kidger at quadrics.com Fri Nov 23 01:56:37 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com> >> I have recently built a small beowulf cluster and am now looking at >> getting a Fortran90 compiler. Can anyone on the list recommend >> which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ? > >>I've tested three Fortran 90 compilers in this basic environment, on a >>suite of scientific codes. They are the Portland Group's f90, NAG >>f95, and Lahey/Fujitsu's lf95. I also tried the Portland Group HPF >>compiler. > >>I have found the Portland Group products to be heavily bug-ridden, and >>essentially unusable by a group of scientists that are actively >>developing code that uses Fortran 90 (or HPF) features. Moreover, >>carefully constructed bug reports submitted to the company failed to >>stir them. I strongly advise avoiding this company. I would be very careful about your damming of Portland. It is widely used and a large base of users and so expect some flames! However you do not mention the Intel Compiler. In virtually all our tests on dual Pentium 4s, it outperformed the others that we tried. Also it works fine with mpich (apart from the fact that you need to build mpich to expect only a single underscore on subroutines) Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From bdorland at kendall.umd.edu Fri Nov 23 02:50:09 2001 From: bdorland at kendall.umd.edu (Bill Dorland) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com> (message from Daniel Kidger on Fri, 23 Nov 2001 09:56:37 -0000) References: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com> Message-ID: <200111231050.fANAo9c30456@kendall.umd.edu> > However you do not mention the Intel Compiler. In virtually all our > tests on dual Pentium 4s, it outperformed the others that we tried. I have never used the Intel compiler. Our cluster (Imperial College, London) is built around AMD Athlons. Is the Intel compiler compatible with Athlons? --Bill From Daniel.Kidger at quadrics.com Fri Nov 23 03:29:10 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:53 2009 Subject: 32 bit vs 64 bit computer ? Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D33B@stegosaurus.bristol.quadrics.com> >I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit computer. >I don't know how much(the maximum number) the 32 bit computer vs 64 bit makes exact calculation without error. It is a common misunderstanding to equate a 32 bit compiter with 32-bit numbers in calculations. For example my old ZX-Spectrum was an 8-bit computer (and so could only address 64kB of memory) but stored floating point numbers in 40 bits. What also can add to the confusion is that Intel Pentiums (which are 32-bit machines) have always had 64-bit floating point numbers, but internal to the CPU floating point units they are stored as 80-bits. >and With different architecture PCs - for example AMD, Intel, MAC cpu , Is it possible to communicate the calculation results each other ? >and With same os - for example LINUX, Is it possible to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ? Your other question was about communication between heterogeneous architectures. This again has always been possible. Before MPI (unfortunately) came to dominate message-passing, PVM was the standard library used. PVM is designed for heterogeneous systems. For example I have a code that uses MPI internally on both a Cray T3E and also a Fujitsu Vector Processor but which uses PVM to communicate between the two big machines. Yours, Daniel. -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/93058e45/attachment.html From jcownie at etnus.com Fri Nov 23 03:42:10 2001 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: Message from Bill Dorland of "Fri, 23 Nov 2001 05:50:09 EST." <200111231050.fANAo9c30456@kendall.umd.edu> Message-ID: <167Eih-0SK-00@etnus.com> > I have never used the Intel compiler. Our cluster (Imperial > College, London) is built around AMD Athlons. Is the Intel compiler > compatible with Athlons? At higher optimisation levels when it is compiling for the pentium four it will generate SSE-2 instructions which are not implemented on the Athlons (yet). I'm not sure whether the license _allows_ you to use it to compile for Athlons, or whether it checks somewhere in the runtime to ensure that you don't... You can download it and try it at only the cost of your time (and it may remain free to you as an academic even for production use). It's easy to find on Intel's site if you want to play with it. -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From jcownie at etnus.com Fri Nov 23 04:27:12 2001 From: jcownie at etnus.com (James Cownie) Date: Wed Nov 25 01:01:53 2009 Subject: 32 bit vs 64 bit computer ? In-Reply-To: Your message of "Fri, 23 Nov 2001 11:29:10 GMT." <010C86D15E4D1247B9A5DD312B7F5AA739D33B@stegosaurus.bristol.quadrics.com> Message-ID: <167FQG-0Ta-00@etnus.com> > Before MPI (unfortunately) came to dominate message-passing, PVM was > the standard library used. PVM is designed for heterogeneous > systems. For example I have a code that uses MPI internally on both a > Cray T3E and also a Fujitsu Vector Processor but which uses PVM to > communicate between the two big machines. Despite the implication above that MPI is inferior to PVM in its support of heterogeneous systems, the MPI standard _was_ designed for heterogeneous systems. A conforming MPI program provides enough information on both send and receive to allow the MPI implementation to translate data between machine formats (without requiring a function call per data element to achieve it as PVM used to do!). The issue which is likely preventing you from exploiting this is that of _starting_ MPI processes on these two different machines and exploiting the vendor optimised MPI on both of them. Since CRAY has no incentive to make their MPI handle a Fujitsu VPP, and Fujitsu has no incentive to make their MPI handle a Cray T3E interoperability of _vendor optimised_ MPIs is small. (Though, of course, your Quadrics' MPI will work in an optimised fashion with the Fujitsu VPP and T3E, I expect :-) However, if you're prepared to use a portable MPI such as MPICH, then you can easily handle heterogeneous machines inside a single program. (See the Globus/MPI work, for instance). I have also seen work which used the MPI profiling interface to wrap a vendor MPI so that it would inter-operate with a portable MPI. So, in summary 1) The MPI specification fully supports heterogeneity. 2) There are MPI implementations which support heterogeneity. 3) You're living in another universe if you think that vendors will spend any time making their MPI implementations inter-operate off-box with their competitors, rather than tweaking their on-box performance in the hope of wiping out said competitors ! -- Jim James Cownie Etnus, LLC. +44 117 9071438 http://www.etnus.com From Tobias.Peuker at materna.de Fri Nov 23 05:36:51 2001 From: Tobias.Peuker at materna.de (Tobias.Peuker@materna.de) Date: Wed Nov 25 01:01:53 2009 Subject: Problem with Sun Grid Engines qrsh Message-ID: <01A24CDFE59DD411899F00A0C91012A96B4570@chewbacca.materna.de> Hello, I have a little problem. I am setting up a parralell compiling farm with SGE qmake. But I have a little problem, that I can solve: When I try to use the qrsh command from SGE the following error message occures: bash: ulimit: cannot modify limit: Operation not permitted Normal RSH and everything else works perfectely. Does anybody have an idea how to solve this problem? Regards, Tobi From steveb at aei-potsdam.mpg.de Fri Nov 23 05:46:38 2001 From: steveb at aei-potsdam.mpg.de (Steven Berukoff) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: <200111231050.fANAo9c30456@kendall.umd.edu> Message-ID: Yes, you can use the Intel compilers to compile code for Athlons. Since the AMD instruction set supports SSE, you can include Pentium 3 optimizations that improve performance a bit. What I'd really like to see, however, are gcc for athlon or, better, a compiler from AMD! Cheers Steve > > > However you do not mention the Intel Compiler. In virtually all our > > tests on dual Pentium 4s, it outperformed the others that we tried. > > I have never used the Intel compiler. Our cluster (Imperial College, > London) is built around AMD Athlons. Is the Intel compiler compatible > with Athlons? > > --Bill > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > ===== Steve Berukoff tel: 49-331-5677233 Albert-Einstein-Institute fax: 49-331-5677298 Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de From Florent.Calvayrac at univ-lemans.fr Fri Nov 23 06:47:13 2001 From: Florent.Calvayrac at univ-lemans.fr (Florent.Calvayrac) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers In-Reply-To: <200111230916.fAN9G3W30351@kendall.umd.edu> from "Bill Dorland" at Nov 23, 2001 04:16:03 AM Message-ID: <200111231447.PAA13980@pecbip1.univ-lemans.fr> > > > I have found the Portland Group products to be heavily bug-ridden, and > essentially unusable by a group of scientists that are actively > developing code that uses Fortran 90 (or HPF) features. Moreover, Right : compilation and test of LINPACK on our system gives at least 50% failure of precision tests with pgf77, versus 0% with g77. However, when it works, the generated code is at least 20% faster than with other compilers. To reply to another message, Playstations 2 have a 128 bit processor.... -- Florent Calvayrac | Laboratoire de Physique de l'Etat Condense | UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 From ctierney at hpti.com Fri Nov 23 07:36:38 2001 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: <200111231050.fANAo9c30456@kendall.umd.edu>; from bdorland@kendall.umd.edu on Fri, Nov 23, 2001 at 05:50:09AM -0500 References: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com> <200111231050.fANAo9c30456@kendall.umd.edu> Message-ID: <20011123083638.A8562@hpti.com> On Fri, Nov 23, 2001 at 05:50:09AM -0500, Bill Dorland wrote: > > > However you do not mention the Intel Compiler. In virtually all our > > tests on dual Pentium 4s, it outperformed the others that we tried. > > I have never used the Intel compiler. Our cluster (Imperial College, > London) is built around AMD Athlons. Is the Intel compiler compatible > with Athlons? > > --Bill I tested out a dual Athlon and a dual P4 system with the Portland Group and Intel Fortran compilers. Yes, you can run the Intel compiler on the AMD. It works quite well. The results with my code showed that the Intel compiler was faster on both platforms. Your mileage may vary. The only problem with the Intel compiler is that I have had some problems getting it to take some F77 code that other compilers can handle. I usually can work around the internal compiler errors that the Intel system generates it just takes a little time to find them. Craig > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From ctierney at hpti.com Fri Nov 23 08:00:21 2001 From: ctierney at hpti.com (Craig Tierney) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: ; from steveb@aei-potsdam.mpg.de on Fri, Nov 23, 2001 at 02:46:38PM +0100 References: <200111231050.fANAo9c30456@kendall.umd.edu> Message-ID: <20011123090021.A8761@hpti.com> On Fri, Nov 23, 2001 at 02:46:38PM +0100, Steven Berukoff wrote: > > Yes, you can use the Intel compilers to compile code for Athlons. Since > the AMD instruction set supports SSE, you can include Pentium 3 > optimizations that improve performance a bit. Does anyone know how similar/different are the SSE instructions are implemented Athlon vs. P3/P4 chips? Are the operational counts the same or is one slower than he other? > > What I'd really like to see, however, are gcc for athlon or, better, a > compiler from AMD! An AMD compiler would be nice but it is not going to happen (opinion not fact). However an easy way for them to achieve this is to offer $$$$ to any compiler vendor to implement the 3Dnow instructions natively. Craig > > Cheers > Steve > > > > > > However you do not mention the Intel Compiler. In virtually all our > > > tests on dual Pentium 4s, it outperformed the others that we tried. > > > > I have never used the Intel compiler. Our cluster (Imperial College, > > London) is built around AMD Athlons. Is the Intel compiler compatible > > with Athlons? > > > > --Bill > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > ===== > Steve Berukoff tel: 49-331-5677233 > Albert-Einstein-Institute fax: 49-331-5677298 > Am Muehlenberg 1, D14477 Golm, Germany email:steveb@aei.mpg.de > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (ctierney@hpti.com) From rauch at inf.ethz.ch Fri Nov 23 08:51:33 2001 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:01:53 2009 Subject: Gigabit Ethernet switches and network adaptors. In-Reply-To: <20011119201418.L66460@velocet.ca> Message-ID: On Mon, 19 Nov 2001, Velocet wrote: > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all... > > Here's some of the network GIG E hardware we'd like to recommend: > > > > AceNIC/NetGear GA620(T)/3C985B > > SysKonnect > > NS chipset: > > Cameo SOHO-GA2000T SOHO-GA2500T > > D-Link DGE-500T > > PureData PDP8023Z-TG > > SMC SMC9462TX > > NetGear GA622 > > > How do you find the performance of these NS82830 cards? Do they do > block interupt xfer or whatever it is for more efficient xfer? How much > system/interupt time do they chew up? I don't know about the 82830, but a student in our group is working on a (sepcial) driver for the DP83820 chip on an ASANTE GigaNIX card. While the cards where cheap and have a rich feature set, there are mainly two problems as far as we can see: - The card has hardware bugs. The student discovered 3 bugs, but could fortunately work around them. - The FIFOs on the card are very small (8 KB TX and 32 KB RX if I remember correctly). The student had to fiddle quite a bit with some of the parameters of the card to get acceptable performance. This might also be responsible for the relatively low throughput (it could also be the implementation of the DMA engine). The card seems unable to transfer more than about 70-80 MB/s _without_ any protocol stack (the senders driver just transmits the same data over and over, while the receiver simply marks received packets as `handled'). As a comparison: Our hamachi cards transfer more than 100 MB/s _with_ TCP/IP on the same machines! So, our experiences the DP83820 based cards are not the best, but they work. - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From Daniel.Kidger at quadrics.com Fri Nov 23 10:15:30 2001 From: Daniel.Kidger at quadrics.com (Daniel Kidger) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D33C@stegosaurus.bristol.quadrics.com> >Right : compilation and test of LINPACK on our system >gives at least 50% failure of precision tests with pgf77, >versus 0% with g77. However, when it works, the >generated code is at least 20% faster than with >other compilers. That does not prove that pgf77 is broken! What if linpack (./xhpl) has a bug whereby a variable is not initialised to zero? pgf77 may be actingly correctly by not having to initialise it and g77 may be over-keen in setting all undeclared values to be zero. Yours, Daniel. (ps. yes I _do_ suspect this is actually true - I have spurious problems with the Intel complier on xhpl too) -------------------------------------------------------------- Dr. Dan Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Bristol, BS1 2AA, UK 0117 915 5505 ----------------------- www.quadrics.com -------------------- From djholm at fnal.gov Fri Nov 23 11:10:59 2001 From: djholm at fnal.gov (Don Holmgren) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: <20011123090021.A8761@hpti.com> Message-ID: On Fri, 23 Nov 2001, Craig Tierney wrote: > On Fri, Nov 23, 2001 at 02:46:38PM +0100, Steven Berukoff wrote: > > > > Yes, you can use the Intel compilers to compile code for Athlons. Since > > the AMD instruction set supports SSE, you can include Pentium 3 > > optimizations that improve performance a bit. > > Does anyone know how similar/different are the SSE instructions > are implemented Athlon vs. P3/P4 chips? Are the operational counts > the same or is one slower than he other? > At the very bottom of the page, http://qcdhome.fnal.gov/sse/ I have a table with cycle counts posted for a number of matrix-matrix and matrix-vector routines as measured on a P-III (Coppermine), P4, and an Athlon MP. Times are posted for both a pure-C version of each routine, built with gcc, as well as for an SSE version. The sources for each are available at http://qcdhome.fnal.gov/sse/catalog.html The results are a mixed bag, with each flavor processor sometimes first, second, or third. I'm using only a small subset of SSE - mostly shufps, addps, mulps, with a few xops, movaps, and movups thrown in. I haven't timed individual instructions on all three processors. Don Holmgren Fermilab From jcandy1 at san.rr.com Fri Nov 23 13:13:19 2001 From: jcandy1 at san.rr.com (Jeff Candy) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers References: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus> <200111230916.fAN9G3W30351@kendall.umd.edu> Message-ID: <3BFEBBEF.86CA2754@san.rr.com> Bill Dorland wrote: > I've tested three Fortran 90 compilers in this basic environment, on a > suite of scientific codes. They are the Portland Group's f90, NAG > f95, and Lahey/Fujitsu's lf95. I also tried the Portland Group HPF > compiler. > > I have found the Portland Group products to be heavily bug-ridden, and > essentially unusable by a group of scientists that are actively > developing code that uses Fortran 90 (or HPF) features. Moreover, > carefully constructed bug reports submitted to the company failed to > stir them. I strongly advise avoiding this company. My colleagues at > an American national laboratory independently came to the same > conclusions, based on their problems with the PG products. > > The other two compilers, on the other hand, are both very good. My > colleagues and I are fully satisfied with the performance and > compatibility with the Fortran 90/95 standards of both. I expect that > either would perform well for you. I have grown increasingly more unhappy with The Portland Group and its compilers over the last year. In comparison with the Lahey/Fujitsu product (lf95), for example, quality of syntax and run-time error-checking is worse. License management is more tedious. Code generated with pgf90 tends to be slightly faster, but not by any amount that would recommend its use. I believe an average user will produce bug-free code faster with lf95 than pgf90. Jeff From serguei.patchkovskii at sympatico.ca Fri Nov 23 14:31:28 2001 From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii@sympatico.ca) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers Message-ID: <20011123223128.EJXU24249.tomts11-srv.bellnexxia.net@[209.226.175.18]> > What if linpack (./xhpl) has a bug whereby a variable > is not initialised to zero? > > pgf77 may be actingly correctly by not having > to initialise it and g77 may be over-keen in > setting all undeclared values to be zero. I strongly suspect that adding "-pc64 -Kieee" to your compilation options will allow the tests to complete. Serguei From bjornfot at erix.ericsson.se Mon Nov 19 00:49:18 2001 From: bjornfot at erix.ericsson.se (Lars Bj|rnfot) Date: Wed Nov 25 01:01:53 2009 Subject: Compilation problem. References: <01b501c16bc3$02c6c3e0$906a7080@divine> Message-ID: <3BF8C78E.901668D0@erix.ericsson.se> Hi, I posted an answer for this some months ago, patch added below. Hope it works though the versions is slightly newer. Regards, Lars > "Zhifeng F. Chen" wrote: > > Hi, > > When compiling mvich-1.0a6.1 under mpich-1.2.2.3, > > ./configure --with-device=via --with-arch=LINUX --without-romio -cflags="-DUSE_STDARG -O2 -DCPU_X86 -DNIC_GIGANET -DVIPL095" -lib="-lgnivipl -lpthread" > is fine. > > When I came to make, it reports: > cc1: warnings being treated as errors > queue.c: In function `MPID_Search_unexpected_for_request': > queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP' > make[3]: *** [queue.o] Error 1 > Exit status from make was 2 > make[2]: *** [mpilib] Error 1 > make[1]: *** [mpi-modules] Error 2 > make: *** [mpi] Error 2 > Can anyone help me out? > > ZF The reason seems to be mpid.h that exists in two versions, and the mpid/via/mpid.h seems outdated. I send a patch that works for me (mpich-1.2.1 and mvich-1.0a6.1). It's rough, just to get it to compile. make mpilib # fails # queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP' # see diff ./mpid/ch2/mpid.h ./mpid/via/mpid.h patch -p1 < patch-mpid.h # make mpilib succeeds w/o errors. Regards, Lars > Jeffrey Tilson wrote: > > Hi, > This is my first attempt with mvich (1.0a6.1). I'm using mpich 1.2.2. I have a small Emulex (cLAN 1000) connected cluster running RH 6.2/2.2.19. I've pretty much followed the mvich installation instructions. The problem is the function MPID_AINT_CMP. It doesn't appear to be defined anywhere not used by any code other than queue.c. Can someone suggest a solution to this? > Thanks, > --jeff > *** mpich-1.2.1/mpid/via/mpid.h.orig Tue Jul 4 01:58:12 2000 --- mpich-1.2.1/mpid/via/mpid.h Wed Jun 20 23:57:51 2001 *************** *** 99,108 **** --- 99,110 ---- typedef int MPID_Aint; #define MPID_AINT_SET(a,b) a = b #define MPID_AINT_GET(a,b) a = b + #define MPID_AINT_CMP(a,b) (a) == (b) #elif defined(MPID_LONG8) typedef long MPID_Aint; #define MPID_AINT_SET(a,b) a = b #define MPID_AINT_GET(a,b) a = b + #define MPID_AINT_CMP(a,b) (a) == (b) #else #define MPID_AINT_IS_STRUCT /* This is complicated by the need to set only the significant bits when *************** *** 115,123 **** --- 117,127 ---- #ifndef POINTER_64_BITS #define MPID_AINT_SET(a,b) (a).low = (unsigned)(b) #define MPID_AINT_GET(a,b) (a) = (void *)(b).low + #define MPID_AINT_CMP(a,b) ((a).low == (b).low) #else #define MPID_AINT_SET(a,b) (a) = *(MPID_Aint *)&(b) #define MPID_AINT_GET(a,b) *(MPID_Aint *)&(a) = *&(b) + #define MPID_AINT_CMP(a,b) ((a).low == (b).low) && ((a).high == (b).high) #endif #endif #else /* Not MPID_HAS_HETERO */ *************** *** 131,136 **** --- 135,141 ---- a = b;\ DEBUG_H_INT(fprintf( stderr, "[%d] Aint get %x <- %x\n", MPID_MyWorldRank, a, b ));\ } + #define MPID_AINT_CMP(a,b) (a) == (b) #endif typedef int MPID_RNDV_T; From jyrki.huusko at vtt.fi Wed Nov 21 06:06:49 2001 From: jyrki.huusko at vtt.fi (Jyrki Huusko) Date: Wed Nov 25 01:01:53 2009 Subject: Network simulator2 + Beowulf Message-ID: <4.3.2.7.2.20011121155636.00f33e80@elemail.ele.vtt.fi> Good day, Has anyone used NS2 - Network simulator on Beowulf? In other words has anyone tried to parallelise the NS2-simulator using MPI and run it on distributed environment? We are currently planning to develop a network simulator (like Opnet, GlomoSIM and NS2) for distributed computer systems (mainly Beowulf type clusters) and thus we are quite interested in work already done in this field of study...if there is any information freely available.... Sincerely Yours, Jyrki Huusko "I think there's a world market for about five computers." -Thomas Watson (IBM)- -- Jyrki Huusko, jyrki.huusko@vtt.fi Kaitov?yl? 1 P.O.BOX 1100, FIN-90571 OULU, FINLAND Tel. +358 8 551 2111, Fax +358 8 551 2320 http://www.vtt.fi http://www.willab.fi/telaketju From jharrop at shaw.ca Wed Nov 21 11:42:26 2001 From: jharrop at shaw.ca (4j harrop) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers Message-ID: <5.0.2.1.0.20011121113440.00a208b0@pop3.norton.antivirus> Hi, I've been a lurker on this list for some time. The conversations here have been most helpful while I've been working on getting up to speed. I have recently built a small beowulf cluster and am now looking at getting a FORTRAN90 compiler. Can anyone on the list recommend which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ? If you have negative comments that you would rather not publish to the list, please contact me directly at jharrop@shaw.ca Thanks in advance! John Harrop Adapt Systems Corp Cyberquest Geoscience Ltd From thanhaic at yahoo.com Fri Nov 23 00:49:12 2001 From: thanhaic at yahoo.com (thanh) Date: Wed Nov 25 01:01:53 2009 Subject: ask Message-ID: <001f01c173fb$bbc63840$2a016481@100.1.199.aic.com.vn> Dear, When programming MPICH, I wanted include some class of Qt lib, could you show me the way do it. Help me ! Thank -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/ce4897fa/attachment.html From ron_chen_123 at yahoo.com Sat Nov 24 08:28:36 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:53 2009 Subject: Problem with Sun Grid Engines qrsh In-Reply-To: <01A24CDFE59DD411899F00A0C91012A96B4570@chewbacca.materna.de> Message-ID: <20011124162836.12011.qmail@web14703.mail.yahoo.com> Please send problems with SGE to the opensource mailing-list. (you'll need to subscribe first) If you need commerical support for SGE, please note that there are 3 3rd-party companies providing support for non-Solaris platforms. Back to your question, is your .profile calling the limit? Also, which Linux kernel are you using? -Ron --- Tobias.Peuker@materna.de wrote: > Hello, > I have a little problem. I am setting up a parralell > compiling farm with SGE > qmake. > But I have a little problem, that I can solve: > > When I try to use the qrsh command from SGE the > following error message > occures: > > bash: ulimit: cannot modify limit: Operation not > permitted > > Normal RSH and everything else works perfectely. > Does anybody have an idea how to solve this problem? > > Regards, > Tobi > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From ron_chen_123 at yahoo.com Sat Nov 24 08:50:04 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:53 2009 Subject: ask In-Reply-To: <001f01c173fb$bbc63840$2a016481@100.1.199.aic.com.vn> Message-ID: <20011124165004.68297.qmail@web14706.mail.yahoo.com> What kind of problem did you encounter? Does "mpiCC -L -l" work. -Ron --- thanh wrote: > Dear, > When programming MPICH, I wanted include some class > of Qt lib, could you show me the way do it. > Help me ! > Thank > > __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From atctam at csis.hku.hk Sat Nov 24 23:25:08 2001 From: atctam at csis.hku.hk (Anthony Tam) Date: Wed Nov 25 01:01:53 2009 Subject: NFS service Message-ID: <1006673108.3c009cd4126d2@intranet.csis.hku.hk> Hi all, I am looking for information regarding to the support of high-available or fault-tolerant NFS service on a medium- size cluster (> 32 nodes). Any idea on where can I find these information? Thanks. Cheers Anthony e Y8 d8 88 d8b Y8 88*8e d8888 88*e 88 88 88*8e Y8b Y888 d888b Y8 88 88b 88 88 88 88 88 88 88b Y8b Y8 d888888888 88 888 88 88 88 88 88 88 888 Y8b d888 b Y8 88 888 888 88 88 88 88 88 888 88 88 88 From per at computer.org Sun Nov 25 07:48:27 2001 From: per at computer.org (Per Jessen) Date: Wed Nov 25 01:01:53 2009 Subject: network drivers - using 3c509 and 3c515 in the same system ? Message-ID: <3C00D8A60000B61A@mta2n.bluewin.ch> (added by postmaster@bluewin.ch) All, I've been working on upgrading the master node in our cluster this weekend, and hit an issue with using a 3C509 and a 3C515 card in the same system. When the 3C515 module is loaded first, loading the 3C509 module will lock the system hard. Same goes if the card is a 3C509B (PnP-capable). If instead the 3C509 module is loaded first, the 3C515 driver cannot find the 3C515 card, and refuses to load. I looked at using the newer 3C515.c from the Scyld page, but realised that it only works with 2.2, not 2.4 - and the masternode is 2.4.14. I ended up using 2 x 3C515, but would like to know if anyone else has noticed this behaviour with a combination of 3C509 and 3C515 ? tnx, Per Jessen regards, Per Jessen, Zurich http://www.enidan.com - home of the J1 serial console. Windows 2001: "I'm sorry Dave ... I'm afraid I can't do that." From rgb at phy.duke.edu Sun Nov 25 08:32:25 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: Message-ID: On Fri, 23 Nov 2001, Don Holmgren wrote: > At the very bottom of the page, > http://qcdhome.fnal.gov/sse/ > I have a table with cycle counts posted for a number of matrix-matrix > and matrix-vector routines as measured on a P-III (Coppermine), P4, and > an Athlon MP. Times are posted for both a pure-C version of each > routine, built with gcc, as well as for an SSE version. The sources > for each are available at > http://qcdhome.fnal.gov/sse/catalog.html > > The results are a mixed bag, with each flavor processor sometimes first, > second, or third. I'm using only a small subset of SSE - mostly shufps, > addps, mulps, with a few xops, movaps, and movups thrown in. I haven't > timed individual instructions on all three processors. > > Don Holmgren > Fermilab Awesomely useful, Don, thanks. Do you have any idea what the overall marginal benefit is of using your hand-optimized routines when working on large datasets (too big to fit into cache)? In particular, does performance devolve to memory-bandwidth-bound behavior (and hence end up being the same for MILC and SSE and dominated by the memory bus speed)? rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From spoel at xray.bmc.uu.se Sun Nov 25 09:53:58 2001 From: spoel at xray.bmc.uu.se (David van der Spoel) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: Message-ID: On Sun, 25 Nov 2001, Robert G. Brown wrote: >Do you have any idea what the overall marginal benefit is of using your >hand-optimized routines when working on large datasets (too big to fit >into cache)? In particular, does performance devolve to >memory-bandwidth-bound behavior (and hence end up being the same for >MILC and SSE and dominated by the memory bus speed)? > > rgb Of course YMMV, but for our application (molecular dynamics) the impact of SSE is high: a factor of 1.5 for large applications, more than so for smaller applications (see http://www.gromacs.org/benchmarks/scaling.php for comparisons). I should admit that it was very time consuming to write all that much assembly code (but the guy did it out of his own free will) Groeten, David. ________________________________________________________________________ Dr. David van der Spoel, Biomedical center, Dept. of Biochemistry Husargatan 3, Box 576, 75123 Uppsala, Sweden phone: 46 18 471 4205 fax: 46 18 511 755 spoel@xray.bmc.uu.se spoel@gromacs.org http://zorn.bmc.uu.se/~spoel ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From rgb at phy.duke.edu Sun Nov 25 10:22:16 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:53 2009 Subject: Fortran compilers for Linux/mpich In-Reply-To: Message-ID: On Sun, 25 Nov 2001, David van der Spoel wrote: > On Sun, 25 Nov 2001, Robert G. Brown wrote: > > >Do you have any idea what the overall marginal benefit is of using your > >hand-optimized routines when working on large datasets (too big to fit > >into cache)? In particular, does performance devolve to > >memory-bandwidth-bound behavior (and hence end up being the same for > >MILC and SSE and dominated by the memory bus speed)? > > > > rgb > Of course YMMV, but for our application (molecular dynamics) the impact of > SSE is high: a factor of 1.5 for large applications, more than so for > smaller applications (see http://www.gromacs.org/benchmarks/scaling.php > for comparisons). I should admit that it was very time consuming to write > all that much assembly code (but the guy did it out of his own free will) I've been meaning to go back and play with this -- there must be some way of quantifying the crossover point between CPU bound and memory I/O bound code, and I've got a decent benchmark timing harness at this point that I can use to explore it. It's good to hear that it can yield a real benefit for large data codes though. rgb > > > Groeten, David. > ________________________________________________________________________ > Dr. David van der Spoel, Biomedical center, Dept. of Biochemistry > Husargatan 3, Box 576, 75123 Uppsala, Sweden > phone: 46 18 471 4205 fax: 46 18 511 755 > spoel@xray.bmc.uu.se spoel@gromacs.org http://zorn.bmc.uu.se/~spoel > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From okeefe at sistina.com Sun Nov 25 17:04:25 2001 From: okeefe at sistina.com (Matt Okeefe) Date: Wed Nov 25 01:01:53 2009 Subject: NFS service In-Reply-To: <1006673108.3c009cd4126d2@intranet.csis.hku.hk> References: <1006673108.3c009cd4126d2@intranet.csis.hku.hk> Message-ID: <20011125190425.A16997@sistina.com> On Sun, Nov 25, 2001 at 03:25:08PM +0800, Anthony Tam wrote: > > Hi all, > > I am looking for information regarding to the support of > high-available or fault-tolerant NFS service on a medium- > size cluster (> 32 nodes). Any idea on where can I find > these information? Anthony, Mission Critical Linux, among others, sells NFS fail-over software for two servers. Sistina's GFS is a Linux cluster file system that can allow multiple NFS servers to export the same shared file system to a large number of Beowulf clients (this approach allows much more scalability than just a single NFS server: you can read about it in the paper "Accelerating Technical Computing with Sistina's GFS" at www.sistina.com). If you are interested in using NFS to create a shared root partition for diskless workstations check out the NFS cluster project at Sourceforge: http://clusternfs.sourceforge.net/ I hope this helps. Matt O'Keefe Sistina Software, Inc. > Thanks. > > > Cheers > > Anthony > > > > e Y8 d8 88 > d8b Y8 88*8e d8888 88*e 88 88 88*8e Y8b Y888 > d888b Y8 88 88b 88 88 88 88 88 88 88b Y8b Y8 > d888888888 88 888 88 88 88 88 88 88 888 Y8b > d888 b Y8 88 888 888 88 88 88 88 88 888 88 > 88 > 88 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From ron_chen_123 at yahoo.com Sun Nov 25 20:03:35 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:53 2009 Subject: NFS service In-Reply-To: <1006673108.3c009cd4126d2@intranet.csis.hku.hk> Message-ID: <20011126040335.8681.qmail@web14702.mail.yahoo.com> If you need a really reliable solution, you should use Sun Cluster: http://www.sun.com/clusters/index.jhtml Otherwise, if you need something cheap, may be you can hack around with Linux-HA, with NFS over CFS. http://www.linux-ha.org/ -Ron --- Anthony Tam wrote: > > Hi all, > > I am looking for information regarding to the > support of > high-available or fault-tolerant NFS service on a > medium- > size cluster (> 32 nodes). Any idea on where can I > find > these information? > Thanks. > > > Cheers > > Anthony > > > > e Y8 d8 88 > > d8b Y8 88*8e d8888 88*e 88 88 88*8e > Y8b Y888 > d888b Y8 88 88b 88 88 88 88 88 88 > 88b Y8b Y8 > d888888888 88 888 88 88 88 88 88 88 > 888 Y8b > d888 b Y8 88 888 888 88 88 88 88 88 > 888 88 > > 88 > > 88 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or > unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From gabriel.weinstock at dnamerican.com Mon Nov 26 06:11:07 2001 From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock) Date: Wed Nov 25 01:01:53 2009 Subject: ask In-Reply-To: <20011124165004.68297.qmail@web14706.mail.yahoo.com> References: <20011124165004.68297.qmail@web14706.mail.yahoo.com> Message-ID: <14250523411684@DNAMERICAN.COM> I would suspect it would not work, although I don't have a well thought out reason why. I know that you can't for instance, run svgalib programs with MPI (at least LAM-MPI.) wouldn't you need something like XMPI (LAM) or MPE (MPICH) to do graphical output? Gabe On Saturday 24 November 2001 11:50 am, Ron Chen wrote: > What kind of problem did you encounter? > > Does "mpiCC -L -l Qt lib>" work. > > -Ron > > --- thanh wrote: > > Dear, > > When programming MPICH, I wanted include some class > > of Qt lib, could you show me the way do it. > > Help me ! > > Thank > > __________________________________________________ > Do You Yahoo!? > Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. > http://geocities.yahoo.com/ps/info1 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From joe.griffin at mscsoftware.com Mon Nov 26 06:43:42 2001 From: joe.griffin at mscsoftware.com (Joe Griffin) Date: Wed Nov 25 01:01:53 2009 Subject: 32 bit vs 64 bit computer ? References: <001d01c173ef$a16ae600$5f72f2cb@LocalHost> Message-ID: <3C02551E.8FA0D582@mscsoftware.com> Yoon Jae Ho, >I want to know the exact definition of >the 32 bit computer (PC ) vs 64 bit computer. >and Why we can't make 128 bit computer for >long time ? The term "64 bit computer" is usually used for one of two types: LP64 ..... longs and pointers are 64 bits (example is an Intel Itanium). ILP64 .... Integer, Reals, longs and pointers are 64 bits (example is a CRAY) LP64 systems allow for high address ranges. IPL64 allows for a high address range, and greater accuracy of calculations. To answer why can't we make 128 bit computer, I must ask, why would you want to? 2^128 is a very big number. I could not see the need for either that much address space or that much precision. >I don't know how much(the maximum number) >the 32 bit computer vs 64 bit makes exact > calculation without error. > and With different architecture PCs - for > example AMD, Intel, MAC cpu , Is it possible > to communate the calculatiton results each other ? On a 32 bit system like Intel/AMD chips, Real data uses the following: 1 bit ..... sign 8 bits .... Exponent (magnitude of number) 23 bits ... Mantissa (accuracy of number) On a 32 bit system you may have 64 bit reals. If so: 1 bit .... sign 11 bits ... Exponent 52 bits ... Mantissa >and With same os - for example LINUX, Is it possble > to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ? I believe a strict definition of beowulf is commodity of the shelf systems. I don't think Alpha is included there. But lots of people mean "cluster" when they say beowulf. You can cluster Alpha and Intel systems, but using them together is dependent of the software. >I mean we can communicate the calculation results with >each other( 32 bit vs 64 bit) during caluculation with same O.S ? During the calculations??? I think not. Regards, Joe From lmeerkat at yahoo.com Mon Nov 26 14:51:43 2001 From: lmeerkat at yahoo.com (L. Gritsenko) Date: Wed Nov 25 01:01:53 2009 Subject: How to add Beowulf node with SCSI HD? Message-ID: <20011126225143.27491.qmail@web20604.mail.yahoo.com> Hi, I am using Scyld Beowulf 27bz-8. I boot a node that has SCSI hard driver to a master that has IDE hard driver. After the node was set up to "boot" state I received the following message in the log file : "/dev/hda: No such device". Yes, it is correct that I do not have any "hda" on this node but I do still have "/dev/sda" there! What do I need to change in the boot procedure in order to solve this problem? I beleive I can add a node which has SCSI hard drive. Thanks, Lyudmila Gritsenko ===== __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From vanw at tticluster.com Tue Nov 27 08:41:37 2001 From: vanw at tticluster.com (Kevin Van Workum) Date: Wed Nov 25 01:01:53 2009 Subject: FORTRAN compilers In-Reply-To: <5.0.2.1.0.20011121113440.00a208b0@pop3.norton.antivirus> Message-ID: <001c01c17762$64ce8140$63b36880@aframe> If you'd like to benchmark Lahey Fortran with MPICH on a 1.3 GHz AMD cluster with DDR RAM, checkout these sites: www.tsunamictechnologies.com www.lahey.com Kevin Van Workum University of Wisconsin > -----Original Message----- > From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org] On > Behalf Of 4j harrop > Sent: Wednesday, November 21, 2001 1:42 PM > To: Beowulf mailing list > Subject: FORTRAN compilers > > Hi, I've been a lurker on this list for some time. The conversations here > have been most helpful while I've been working on getting up to speed. I > have recently built a small beowulf cluster and am now looking at getting > a > FORTRAN90 compiler. Can anyone on the list recommend which are better for > Linux (Redhat 7.2) using mpich (1.2.2.3) ? > > If you have negative comments that you would rather not publish to the > list, please contact me directly at jharrop@shaw.ca > > Thanks in advance! > > John Harrop > > > Adapt Systems Corp > Cyberquest Geoscience Ltd > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From becker at scyld.com Tue Nov 27 10:11:37 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:53 2009 Subject: How to add Beowulf node with SCSI HD? In-Reply-To: <20011126225143.27491.qmail@web20604.mail.yahoo.com> Message-ID: On Mon, 26 Nov 2001, L. Gritsenko wrote: > I am using Scyld Beowulf 27bz-8. I boot a node that > has SCSI hard driver to a master that has IDE hard > driver. After the node was set up to "boot" > state I received the following message in the log > file : "/dev/hda: No such device". This is a harmless message. To avoid seeing it, comment out the 'hdparm' call in the node_up script. (The base script is in /etc/beowult/node_up, but that script just calls the /usr/lib/beoboot/bin/node_up script.) Newer releases take care to not emit this confusing message. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From mlrecv at yahoo.com Tue Nov 27 11:27:52 2001 From: mlrecv at yahoo.com (Zhifeng Chen) Date: Wed Nov 25 01:01:53 2009 Subject: SMP support comparison between NT and Linux Message-ID: <20011127192752.1050.qmail@web14810.mail.yahoo.com> Hi, Any review article or comments on SMP support comparison between NT and Linux? Which is better? ZF __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From xfye at mail.ustc.edu.cn Tue Nov 27 18:15:47 2001 From: xfye at mail.ustc.edu.cn (XianFeng Ye) Date: Wed Nov 25 01:01:53 2009 Subject: about SCSI HD & F90 In-Reply-To: <200111271703.fARH3W025958@blueraja.scyld.com> Message-ID: > I am using Scyld Beowulf 27bz-8. I boot a node that > has SCSI hard driver to a master that has IDE hard > driver. After the node was set up to "boot" > state I received the following message in the log > file : "/dev/hda: No such device". Yes, it is Maybe you can do as this: ln -sf /hda/sda /dev/hda > > FORTRAN90 compiler. Can anyone on the list recommend which are better > for > > Linux (Redhat 7.2) using mpich (1.2.2.3) ? Maybe pgfortran can do this. I am puzzled a lot when I compile a f77 program with pgf77(3.2) and g77(2.95) that pgf77's ability can't surpass g77? Can someone comment something on this? From scheinin at crs4.it Wed Nov 28 02:48:49 2001 From: scheinin at crs4.it (Alan Scheinine) Date: Wed Nov 25 01:01:53 2009 Subject: AMD 760 MPX ? Message-ID: <200111281048.LAA10712@dylandog.crs4.it> I have been trying to avoid polluting this newsgroup with a useless question but I cannot contain myself any longer. In a very nice article by Anand Lal Shimpi written on 5 June 2001, we can read "Don't expect too many manufacturers other than Tyan to have a board [with the 760MPX] until mid-late Q3 2001." (On the copy I printed for myself I do not see the URL.) Someone else wrote around the start of November that the 760MPX will be announced in mid-November. Any news? best regards, Alan Scheinine Email: scheinin@crs4.it From lindahl at conservativecomputer.com Wed Nov 28 03:17:08 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:53 2009 Subject: AMD 760 MPX ? In-Reply-To: <200111281048.LAA10712@dylandog.crs4.it>; from scheinin@crs4.it on Wed, Nov 28, 2001 at 11:48:49AM +0100 References: <200111281048.LAA10712@dylandog.crs4.it> Message-ID: <20011128061708.A4935@wumpus.foo> On Wed, Nov 28, 2001 at 11:48:49AM +0100, Alan Scheinine wrote: > Someone > else wrote around the start of November that the 760MPX will be > announced in mid-November. Any news? I saw 2 vendors at the SC2001 show in Denver with running 760MPX machines. Neither was the final product, however. greg From aleahy at knox.edu Wed Nov 28 05:14:35 2001 From: aleahy at knox.edu (Andrew Leahy) Date: Wed Nov 25 01:01:53 2009 Subject: AMD 760 MPX ? References: <200111281048.LAA10712@dylandog.crs4.it> Message-ID: <3C04E33B.B53C035@knox.edu> Alan Scheinine wrote: > > I have been trying to avoid polluting this newsgroup with > a useless question but I cannot contain myself any longer. > In a very nice article by Anand Lal Shimpi written on 5 June 2001, > we can read "Don't expect too many manufacturers other than Tyan > to have a board [with the 760MPX] until mid-late Q3 2001." (On > the copy I printed for myself I do not see the URL.) Someone > else wrote around the start of November that the 760MPX will be > announced in mid-November. Any news? > best regards, > Alan Scheinine Email: scheinin@crs4.it > There was a post about this at 2cpu.com recently (a good place for dual processor news/rumors). The link they point to is: http://www.theinquirer.org/27110112.htm But I've been reading these "they're almost here" articles for a while now, so take it with a grain of salt. Andrew Leahy aleahy@knox.edu From jared_hodge at iat.utexas.edu Wed Nov 28 07:29:18 2001 From: jared_hodge at iat.utexas.edu (Jared Hodge) Date: Wed Nov 25 01:01:53 2009 Subject: Channel Bonding Question Message-ID: <3C0502CE.95EF747A@iat.utexas.edu> I was wondering if it is possible to link two ethernet NICs (channel bonding, sort of) on our server to work together talking to a single switch. I've lately come to realize that most work with channel bonding requires two entirely separate networks, but what I want to do is connect the two NICs two the switch (Cisco Catalyst) and allow it to effectively communicate with two of the nodes at full speed at the same time. I guess that this would be more along the lines of line trunking or multi-link or some other networking scheme. If anyone knows of any links that describe how to do this, I would appreciate it. Thanks. -- Jared Hodge Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: Jared_Hodge@iat.utexas.edu From ak at dkp.com Wed Nov 28 08:25:39 2001 From: ak at dkp.com (Andrew Klaassen) Date: Wed Nov 25 01:01:53 2009 Subject: Channel Bonding Question In-Reply-To: <3C0502CE.95EF747A@iat.utexas.edu> References: <3C0502CE.95EF747A@iat.utexas.edu> Message-ID: <20011128112539.D2508@dkp.com> On Wed, Nov 28, 2001 at 09:29:18AM -0600, Jared Hodge wrote: > I was wondering if it is possible to link two ethernet NICs > (channel bonding, sort of) on our server to work together > talking to a single switch. I've lately come to realize that > most work with channel bonding requires two entirely separate > networks, but what I want to do is connect the two NICs two > the switch (Cisco Catalyst) and allow it to effectively > communicate with two of the nodes at full speed at the same > time. I guess that this would be more along the lines of line > trunking or multi-link or some other networking scheme. If > anyone knows of any links that describe how to do this, I > would appreciate it. Thanks. No link, but here are the config files we need in order to make this work on a Redhat box (from the /etc/sysconfig/network-scripts directory): ---ifcfg-bond0--- DEVICE=bond0 BOOTPROTO=static BROADCAST=192.168.0.255 IPADDR=192.168.0.181 NETMASK=255.255.255.0 NETWORK=192.168.0.0 ONBOOT=yes ---ifcfg-eth0--- DEVICE=eth0 BOOTPROTO=static MASTER=bond0 SLAVE=yes ONBOOT=yes ---ifcfg-eth1--- DEVICE=eth1 BOOTPROTO=static MASTER=bond0 SLAVE=yes ONBOOT=yes And, in /etc/modules.conf: alias bond0 bonding The switch also needs to be set up for this. We've got an HP and a Foundry switch both doing it; one calls it "Fast EtherChannel" (originally a Cisco term?), the other "Trunking", and the Linux box "bonding". Setup was pretty straightforward once I figured out where in the switch manuals everything was... Andrew Klaassen From josip at icase.edu Wed Nov 28 09:50:46 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:53 2009 Subject: Xbox clusters? Message-ID: <3C0523F6.254E0EE9@icase.edu> Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about $300. This would make it a reasonably powerful cluster node with an excellent price/performance ratio. Of course, the thing runs a slimmed-down variant of Windows 2000 instead of Linux, but has anyone discussed making an Xbox cluster? Sincerely, Josip P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan (see http://ps2.ign.com/news/33873.html or http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html). Xbox will probably not see anything similar from Microsoft. Other game boxes may be less powerful, but may have better prospects with Linux. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From rgb at phy.duke.edu Wed Nov 28 10:37:58 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:53 2009 Subject: Xbox clusters? In-Reply-To: <3C0523F6.254E0EE9@icase.edu> Message-ID: On Wed, 28 Nov 2001, Josip Loncaric wrote: > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > $300. This would make it a reasonably powerful cluster node with an > excellent price/performance ratio. Of course, the thing runs a > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > discussed making an Xbox cluster? > > Sincerely, > Josip Dear Josip, Case $60 Motherboard $100 Athlon XP 1500 $150 256 MB PC2100 DDR $40 100BT NIC $20 ===================== Total $370, with optional small HD and video $500-550. Even assuming no better than direct clock speed scaling between the 1.4 GHz 1500 and the 733 MHz PIII, even ignoring the scalability and manageability and parallel software support advantages of linux, even ignoring the speed advantages of 256 MB of DDR over 64 MB of SDRAM, even ignoring Amdahl's law (where one cpu at speed 2X is generally "better" than two cpus at speed X) this still makes no economic sense, in that aggregate 1467 MHz / 1400 MHz = 1.05 but $600/$500 = 1.2. And you get to run linux. And you get the DDR. And you get 2-3x the HD disk. And you don't have to run Windows or add to the greatest/worst monopoly the world has ever seen. And you get to choose your NIC. And you get to run linux. I doubt it is worth it even for $250/node. Perhaps $200. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From jared_hodge at iat.utexas.edu Wed Nov 28 10:41:32 2001 From: jared_hodge at iat.utexas.edu (Jared Hodge) Date: Wed Nov 25 01:01:53 2009 Subject: Channel Bonding Question References: <3C0502CE.95EF747A@iat.utexas.edu> <3C052E53.34C3A9A0@obs.unige.ch> Message-ID: <3C052FDC.4EE5BF3D@iat.utexas.edu> I thought that might be the case, but I've heard of software (Cisco trunking I think) that can create a virtual IP for a NIC that doesn't exist, and when something is sent to it, the software splits it to the two NICs and reassembles it there. I think you have to make sure nothing goes to the original IPs though. Daniel Pfenniger wrote: > > Jared Hodge wrote: > > > > I was wondering if it is possible to link two ethernet NICs (channel > > bonding, sort of) on our server to work together talking to a single > > switch. I've lately come to realize that most work with channel bonding > > requires two entirely separate networks, but what I want to do is > > connect the two NICs two the switch (Cisco Catalyst) and allow it to > > effectively communicate with two of the nodes at full speed at the same > > time. I guess that this would be more along the lines of line trunking > > or multi-link or some other networking scheme. If anyone knows of any > > links that describe how to do this, I would appreciate it. Thanks. > > In that case each NIC must have one (or more) distinct IP number, so your > applications should be able to manage that. > > Dan -- Jared Hodge Institute for Advanced Technology The University of Texas at Austin 3925 W. Braker Lane, Suite 400 Austin, Texas 78759 Phone: 512-232-4460 Fax: 512-471-9096 Email: Jared_Hodge@iat.utexas.edu From math at velocet.ca Wed Nov 28 11:04:55 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <3C0523F6.254E0EE9@icase.edu>; from josip@icase.edu on Wed, Nov 28, 2001 at 12:50:46PM -0500 References: <3C0523F6.254E0EE9@icase.edu> Message-ID: <20011128140455.E1210@velocet.ca> On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > $300. This would make it a reasonably powerful cluster node with an > excellent price/performance ratio. Of course, the thing runs a > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > discussed making an Xbox cluster? Why bother when for about $300 USD you can put together a cluster node with a 1.333GHz athlon with 256Mb of DDR ram? Sides, who brought 'price/performance' onto this list? Dont know thats never a factor on the beowulf list? :) /kc > Sincerely, > Josip > > P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan > (see http://ps2.ign.com/news/33873.html or > http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html). Xbox > will probably not see anything similar from Microsoft. Other game boxes > may be less powerful, but may have better prospects with Linux. > > > -- > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From joseph.keen at eglin.af.mil Wed Nov 28 11:19:12 2001 From: joseph.keen at eglin.af.mil (Keen Joseph M Contr 46 SK/SKE) Date: Wed Nov 25 01:01:54 2009 Subject: Scyld boot problem Message-ID: <0FA55B4C91D3D411BDD4009027724DDAE4DCE9@eg-002-009.eglin.af.mil> Greetings, I'm looking for some help on a problem we're having with getting the demo Scyld distribution working on our cluster. I'm not the admin for the cluster and will probably omit some critical information on the first pass so please bear with me. The cluster configuration consists of 8 single-processor nodes and 8 dual-processor nodes. The single-cpu nodes boot without problem. The dual-cpu nodes do not. The screen information indicates a problem after the partition check at the "end of phase 1". The following message appears: Invalid session number or type of track Kernel panic: VFS: Unable to mount root fs on 03:05 Rebooting in 30 seconds ... This results in an continuous boot loop. We get this same result for each of the dual-cpu nodes. Any ideas/suggestions? Thanks, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011128/8fdf8727/attachment.html From math at velocet.ca Wed Nov 28 11:40:19 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <20011128140455.E1210@velocet.ca>; from math@velocet.ca on Wed, Nov 28, 2001 at 02:04:55PM -0500 References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> Message-ID: <20011128144018.G1210@velocet.ca> On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all... > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > $300. This would make it a reasonably powerful cluster node with an > > excellent price/performance ratio. Of course, the thing runs a > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > discussed making an Xbox cluster? > > Why bother when for about $300 USD you can put together a > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > Sides, who brought 'price/performance' onto this list? Dont know thats never a > factor on the beowulf list? :) So, the question is, with these numbers, how do people end up spending $250K on 40 or even 60-CPU clusters? /kc From bargle at umiacs.umd.edu Wed Nov 28 11:42:37 2001 From: bargle at umiacs.umd.edu (Gary Jackson) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: Your message of "Wed, 28 Nov 2001 14:04:55 EST." <20011128140455.E1210@velocet.ca> Message-ID: <200111281942.OAA16730@leviathan.umiacs.umd.edu> On Wed, 28 Nov 2001, Velocet wrote: >Why bother when for about $300 USD you can put together a >cluster node with a 1.333GHz athlon with 256Mb of DDR ram? Because you don't have to "pay" for assembly, or debugging the equipment, or anything like that. You even get a 90 day warranty. With a self assembled beige box, it may take you 90 days to figure out which part is broken. -- Gary Jackson bargle@umiacs.umd.edu From dwu at Swales.com Wed Nov 28 11:48:58 2001 From: dwu at Swales.com (Dominic Wu) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <3C0523F6.254E0EE9@icase.edu> Message-ID: It runs an XP variant and the RAM seems to be a bit on the low side. -----Original Message----- From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On Behalf Of Josip Loncaric Sent: Wednesday, November 28, 2001 9:51 AM To: Beowulf mailing list Subject: Xbox clusters? Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about $300. This would make it a reasonably powerful cluster node with an excellent price/performance ratio. Of course, the thing runs a slimmed-down variant of Windows 2000 instead of Linux, but has anyone discussed making an Xbox cluster? Sincerely, Josip P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan (see http://ps2.ign.com/news/33873.html or http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html). Xbox will probably not see anything similar from Microsoft. Other game boxes may be less powerful, but may have better prospects with Linux. -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From wsb at paralleldata.com Wed Nov 28 12:25:37 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128144018.G1210@velocet.ca> Message-ID: <3C054841.21352D4D@paralleldata.com> Velocet wrote: > > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all... > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > > $300. This would make it a reasonably powerful cluster node with an > > > excellent price/performance ratio. Of course, the thing runs a > > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > > discussed making an Xbox cluster? > > > > Why bother when for about $300 USD you can put together a > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > > > Sides, who brought 'price/performance' onto this list? Dont know thats never a > > factor on the beowulf list? :) > > So, the question is, with these numbers, how do people end up spending > $250K on 40 or even 60-CPU clusters? > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor. It's unusual for a large corp to be off putting it's own PC's together. Wes From j.c.burton at gats-inc.com Wed Nov 28 12:50:13 2001 From: j.c.burton at gats-inc.com (John Burton) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? References: <200111281942.OAA16730@leviathan.umiacs.umd.edu> Message-ID: <3C054E04.83608604@gats-inc.com> Gary Jackson wrote: > On Wed, 28 Nov 2001, Velocet wrote: > > >Why bother when for about $300 USD you can put together a > >cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > Because you don't have to "pay" for assembly, or debugging the > equipment, or anything like that. You even get a 90 day warranty. > With a self assembled beige box, it may take you 90 days to figure out > which part is broken. Ummmm....speak for yourself. I've been putting together these "self assembled beige box" for many years and currently have about 5% component DOA rate, and about another 1% infant mortality rate (crap out within 30 days). Takes on average 4 hours to determine what the bad component is and 24-48 hours to replace it. I've never spent more than 1 week "figuring out" which part is broken. The time I spent 1 week was due to a flakey memory chip that was causing filesystem errors in a 90GB RAID 5 array. Flakey memory is difficult to track down because it can masquerade as virtually anything else... With the current components, you put it together and it either works or doesn't. If it doesn't you can usually zero in on the problem pretty quickly... buy quality components that you know work together and your job is even easier John From math at velocet.ca Wed Nov 28 12:58:35 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <20011128155517.A27265@sauerburger.nrl.navy.mil>; from stephan@sauerburger.nrl.navy.mil on Wed, Nov 28, 2001 at 03:55:17PM -0500 References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128155517.A27265@sauerburger.nrl.navy.mil> Message-ID: <20011128155835.I1210@velocet.ca> On Wed, Nov 28, 2001 at 03:55:17PM -0500, Stephan Sauerburger's all... > Where at? Pricewatch? And does that include HDD? ya you can get them for under $100USD. go check out pricewatch and find a store you can buy the whole kit from. (considering my designs I mighta left the case out tho. We rack our stuff into custom cabinets). /kc > > > ~Stephan > > > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet wrote: > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > > $300. This would make it a reasonably powerful cluster node with an > > > excellent price/performance ratio. Of course, the thing runs a > > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > > discussed making an Xbox cluster? > > > > Why bother when for about $300 USD you can put together a > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > > > Sides, who brought 'price/performance' onto this list? Dont know thats never a > > factor on the beowulf list? :) > > > > /kc > > > > > Sincerely, > > > Josip > > > > > > P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan > > > (see http://ps2.ign.com/news/33873.html or > > > http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html). Xbox > > > will probably not see anything similar from Microsoft. Other game boxes > > > may be less powerful, but may have better prospects with Linux. > > > > > > > > > -- > > > Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu > > > ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ > > > NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov > > > Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > > Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From rgb at phy.duke.edu Wed Nov 28 13:12:57 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <20011128144018.G1210@velocet.ca> Message-ID: On Wed, 28 Nov 2001, Velocet wrote: > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all... > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > > $300. This would make it a reasonably powerful cluster node with an > > > excellent price/performance ratio. Of course, the thing runs a > > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > > discussed making an Xbox cluster? > > > > Why bother when for about $300 USD you can put together a > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > > > Sides, who brought 'price/performance' onto this list? Dont know thats never a > > factor on the beowulf list? :) > > So, the question is, with these numbers, how do people end up spending > $250K on 40 or even 60-CPU clusters? Well, start with $300 rackmount cases (a rackmount case alone can easily cost more than an Xbox). Add a high end P4 motherboard, the fastest P4-Xeon, and fully populate the MoBo with the biggest, most expensive RDRAM sticks you can find. Get a big, fast SCSI drive and controller. Finish off with the fastest network you can arrange. The high speed network alone can cost $2K/node, and one can easily enough spend $2K on a rackmount P4 node (exclusive of the high-speed network). Besides, a lot of the top-end numbers are (or at any rate were) generated by alpha/myrinet clusters, where individual nodes could easily run $6K, with discount, NOT including the network, maybe $8K/node including the network. One could drop more than $500K on a 64 node cluster without even breaking a sweat. Note that this sort of high end cluster was (and really still is) appropriate for moderately fine-grained parallel computations, where one needs to spend proportionally much more for the network than usual, and where the fastest possible processors with the fastest and biggest memory can help control the ratio of serial code fraction to parallel code fraction, allowing one to actually scale an application UP to 64 nodes. Yes, one might be able to afford 1000 AMD nodes on some agglomeration of daisy chained switches for the same $500K (if you could afford to house and feed them given that they would consume some 100 KW or more in operation). Yes, those 1000 nodes might have 2-3x the aggregate power of the really expensive cluster for the same money. However, if >>your<< problem only scales to 6 nodes with that ratio of CPU speed to network speed, the giant AMD cluster is obviously not smart. There is a tremendous range of variation in cluster designs, with all sorts of mixes of investment in node speed, memory speed, network topology and speed, and while the "standard recipe" beowulfish cluster (pile of PC's, switched 100BT, linux) is right for some (indeed, right for me:-) it isn't right for everybody. So Josip's question was really relevant and one that we've kicked around on this list some before -- one day game systems may well be viable candidates as nodes. I don't think the Xbox is there yet. The new/future Sonies may be, but I'm not so certain. The problem is: All PC's can play games, many of them as well or better than a dedicated gaming box. PC's can do much more -- they are general purpose. The parts for a PC are all commodity and largely interchangeable. These factors conspire to keep PC's as powerful AND cheap as they can reasonably be. Game boxes nowadays have to be able to do nearly everything a PC can do -- a motherboard with integrated graphics, sound and network is just about a game box on a board, lacking only an operating system and some I/O channels. There is such a small and narrowing window in between these two extremes that I'm not at all convinced that there will EVER be an advantage in using game systems as nodes. By the time they have the features and expandability of a PC-based node, they will necessarily reach the PC in price point or somebody will just repackage the node and sell it as a PC (and so reach the PC in price point). Anyway, over the many years I've seen "thin" or "special purpose" systems of all sorts come with much hooraw and seen them go again like thieves in the night, with most souls sorry they ever bought them. The general purpose cost/benefit sweet spot is right in the middle of the PC commodity market because market forces evolve it that way, and only rarely does a processor based "computational" design (excluding the vast world of controllers) come along that really can sustain a special purpose market let alone be backportable to general purpose use. This is the Lesson of the Wang. (At least for those of you old enough to remember what one is...:-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From ak at dkp.com Wed Nov 28 13:19:58 2001 From: ak at dkp.com (Andrew Klaassen) Date: Wed Nov 25 01:01:54 2009 Subject: Channel Bonding Question In-Reply-To: <20011128112539.D2508@dkp.com> References: <3C0502CE.95EF747A@iat.utexas.edu> <20011128112539.D2508@dkp.com> Message-ID: <20011128161956.H2508@dkp.com> On Wed, Nov 28, 2001 at 11:25:39AM -0500, I wrote: > On Wed, Nov 28, 2001 at 09:29:18AM -0600, > Jared Hodge wrote: > > I was wondering if it is possible to link two ethernet NICs > > (channel bonding, sort of) on our server to work together > > talking to a single switch. I've lately come to realize that > > most work with channel bonding requires two entirely separate > > networks, but what I want to do is connect the two NICs two > > the switch (Cisco Catalyst) and allow it to effectively > > communicate with two of the nodes at full speed at the same > > time. I guess that this would be more along the lines of line > > trunking or multi-link or some other networking scheme. If > > anyone knows of any links that describe how to do this, I > > would appreciate it. Thanks. > No link, but here are the config files we need in order to make > this work on a Redhat box... Ah - I had a chance to look through the Redhat startup scripts, and it looks like all you need on the Linux box side of things is ifenslave. From the ifenslave manpage: # modprobe bonding # ifconfig bond0 192.168.0.1 netmask 255.255.0.0 # ifenslave bond0 eth0 eth1 Hope that helps. Andrew Klaassen From josip at icase.edu Wed Nov 28 13:23:32 2001 From: josip at icase.edu (Josip Loncaric) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? References: Message-ID: <3C0555D4.56F3CA99@icase.edu> "Robert G. Brown" wrote: > > On Wed, 28 Nov 2001, Josip Loncaric wrote: > > > has anyone discussed making an Xbox cluster? > > I doubt it is worth it even for $250/node. Perhaps $200. You may be right. A cluster node does not need the Xbox-style fancy graphics, DVD drive, nor (sometimes) the hard drive, but it would need more memory and more software flexibility. However, the appeal of buying compact preconfigured CPU+RAM+NIC building blocks remains... BTW, the big monopolist selling the Xbox is supposedly losing $100 per unit, which they won't recover from people who play "Linux cluster games" instead of buying the usual crash-burn-maim commercial fare. Linux-on-Xbox idea is unlikely to get any help, even though Sony's Linux-on-PS2 was tried in Japan. Sincerely, Josip -- Dr. Josip Loncaric, Research Fellow mailto:josip@icase.edu ICASE, Mail Stop 132C PGP key at http://www.icase.edu./~josip/ NASA Langley Research Center mailto:j.loncaric@larc.nasa.gov Hampton, VA 23681-2199, USA Tel. +1 757 864-2192 Fax +1 757 864-6134 From patrick at myri.com Wed Nov 28 13:23:27 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? References: <200111281048.LAA10712@dylandog.crs4.it> Message-ID: <3C0555CF.1C29FD19@myri.com> Hi Alan, Alan Scheinine wrote: > the copy I printed for myself I do not see the URL.) Someone > else wrote around the start of November that the 760MPX will be > announced in mid-November. Any news? We have received two machines for tests from AMD 3 weeks ago, and we took them to SC01. I don't know about AMD's schedule for official release. I can only say that we are very (VERY) pleased with these boxes. I think it will be a best choice for a lot of clusters. Patrick From rgb at phy.duke.edu Wed Nov 28 13:47:36 2001 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? In-Reply-To: <200111281942.OAA16730@leviathan.umiacs.umd.edu> Message-ID: On Wed, 28 Nov 2001, Gary Jackson wrote: > On Wed, 28 Nov 2001, Velocet wrote: > > >Why bother when for about $300 USD you can put together a > >cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > Because you don't have to "pay" for assembly, or debugging the > equipment, or anything like that. You even get a 90 day warranty. > With a self assembled beige box, it may take you 90 days to figure out > which part is broken. Surely you jest. The systems I buy come with a lifetime labor warranty and typically have a year parts warranty. The vendor will assemble them for me basically for free when I buy in bulk or maybe for $50 each if I'm buying only one or two. I generally buy the parts and build them myself in the latter case to save the money. With my trusty electric screwdriver, I can build a system out of component parts in about 30 minutes, and so can pretty nearly anyone on this list. Motherboard screws onto the case. Drives screw onto rails or into popout cages. CPU snaps in, memory snaps in, cards snap in. The hardest single thing is the cabling -- gotta connect all these itty-bitty lines from the case to the motherboard in the right places. Power is simple. Drive cables are simple. Building a lego castle with my sons is MUCH harder. So is assembling a bicycle. Maintenance is usually pretty simple. The parts most likely to fail are the drives (obvious), power supply, and the CPU/motherboard (also obvious). When buying just one system, it does help to have a local service department to play the swap game. If you are buying fifty, though, spending a few dollars more on a set of swap-em parts (or just borrowing them from a known-good system) to determine what is wrong is no big deal and almost never takes more than an hour or two of time. Then, all the parts are >>cheap and readily available<< and one can often fix the system entirely in times ranging from one hour to an afternoon. I'm also reasonably confident that I'll be able to fix the system (for ever decreasing prices) through at least the first 3-5 years of ownership before it becomes no longer worth it. Now how, exactly, are you going to get an Xbox fixed after its 90 days runs out? Is it a bad CPU, dust on the CD drive, a crashed hard disk, a bad power supply, a bad memory chip? No real OS, no diagnostics. Nobody this side of the factory with spare parts for at least part of what could be wrong. You'll end up either playing the swap game (if you are lucky) with whatever parts inside are indeed commodity with even less to go on than you might have with a real computer OR mailing it in for depot repair OR throwing it away. One round of depot repair will likely cost half as much as the system itself -- $50/hour for labor plus parts plus shipping both ways. Throwing it away costs the whole system. Fixing it yourself? Well, which one would YOU rather fix -- a system you built yourself designed to be expandable and easy to fix or a box deliberately engineered to be "closed" to customers and ultimately disposable so they can sell you more? Just my opinion, of course...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From paullu at cs.ualberta.ca Wed Nov 28 13:47:47 2001 From: paullu at cs.ualberta.ca (Paul Lu) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? In-Reply-To: <3C0555CF.1C29FD19@myri.com>; from patrick@myri.com on Wed, Nov 28, 2001 at 04:23:27PM -0500 References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> Message-ID: <20011128144747.V18934@cs.ualberta.ca> Hello: On Wed, Nov 28, 2001 at 04:23:27PM -0500, Patrick Geoffray wrote: > We have received two machines for tests from AMD 3 weeks ago, > and we took them to SC01. I don't know about AMD's schedule > for official release. I can only say that we are very (VERY) > pleased with these boxes. > > I think it will be a best choice for a lot of clusters. To the extent that you can/are allowed, would you care to comment on how well these boards perform, especially wrt 64-bit/66 MHz Myrinet interfaces? We will be ordering a Myrinet-based cluster shortly and this information would be helpful. Thank you, ...Paul From lindahl at conservativecomputer.com Wed Nov 28 14:06:49 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? In-Reply-To: <20011128144747.V18934@cs.ualberta.ca>; from paullu@cs.ualberta.ca on Wed, Nov 28, 2001 at 02:47:47PM -0700 References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> Message-ID: <20011128170649.A1825@wumpus.foo> On Wed, Nov 28, 2001 at 02:47:47PM -0700, Paul Lu wrote: > To the extent that you can/are allowed, would you care to comment on > how well these boards perform, especially wrt 64-bit/66 MHz Myrinet > interfaces? Patrick's probably getting tired of saying, "We signed a nondisclosure form." Rest assured that the instant anyone gets an actual release version and isn't under NDA, I'll publish the Myrinet PCI test results on my Myrinet performance webpages. greg From dvos12 at calvin.edu Wed Nov 28 14:11:09 2001 From: dvos12 at calvin.edu (David Vos) Date: Wed Nov 25 01:01:54 2009 Subject: custom hardware (was: Xbox clusters?) In-Reply-To: <3C054E04.83608604@gats-inc.com> Message-ID: On Wed, 28 Nov 2001, John Burton wrote: > Ummmm....speak for yourself. I've been putting together these "self > assembled beige box" for many years and currently have about 5% > component DOA rate, and about another 1% infant mortality rate (crap > out within 30 days). Takes on average 4 hours to determine what the > bad component is and 24-48 hours to replace it. I've never spent more > than 1 week "figuring out" which part is broken. The time I spent 1 > week was due to a flakey memory chip that was causing filesystem > errors in a 90GB RAID 5 array. Flakey memory is difficult to track > down because it can masquerade as virtually anything else... There is one computer in our cluster that would make me think twice before doing a custom build. I prefer to call it the node from heck. It only has one problem: it won't boot. If you press the power button, the powerlight flashes while the cpu and case fans turn a quarter turn, then nothing. You have to wait a minute before you even get that reaction again. (Sounds like a short somewhere). The problem only surfaces if the computer has been off for a little while, and nearly every time at that. 1st Occurance (several months ago). Try new power supply. No go. Remove drives, cards, etc. from motherboard until only (new) PS(power supply), Motherboard, Mem, and CPU. Nope. Swap mem. Nope. Swap CPU. Nope. Sounds like the motherboard (I replaced everything else). I return the original parts (and drop a screwdriver on the motherboard by accident), and it suddenly starts working. I put computer back in and it runs fine with everything the way it was before. 2nd Occurance (a month or so later). I knew it was a bad motherboard last time, so I replaced the motherboard. Worked great. 3rd Occurance (a month or so later). I take things apart and put them back together. Starts working. Now I'm starting to get confused. 4th Occurance (a month or so later). I remove drives and cards, put in spare PS. Nothing. Remove motherboard and put on a piece of wood with nothing attached but spare PS, CPU, and mem (using a screw driver to short pins instead of power switch). Used a new power cable plugged into a different circuit. Nothing. Try new mem. Get another system and individually check mem, motherboard, cpu. They are all good. Try both PS's in other system and problem follows them. Two bad powersupplies -- not too unusual. I replace them, and things run great. 5th Occurance (recently). I removed all cards, drives from motherboard. Nothing. Tried spare PS. That worked. Unplugged current PS from case, HD, FD, it started working. Put everything back together and it was still working. Since there is not a single piece of hardware that was present in each case, I feel forced to conclude that there must be something (power cord?) that is braking the power supplies. I have not seen this problem on any other computers. This is the point at which I would love to put the whole computer back in a box and send it to the reseller. Luckily we never sent back the "bad" motherboard and keep it around as a spare, since it works fine on other systems, now. David From patrick at myri.com Wed Nov 28 14:37:32 2001 From: patrick at myri.com (Patrick Geoffray) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> Message-ID: <3C05672C.E77C9DBF@myri.com> Hi Paul, Paul Lu wrote: > To the extent that you can/are allowed, would you care to comment on > how well these boards perform, especially wrt 64-bit/66 MHz Myrinet > interfaces? Unfortunately, I cannot give this information, it's under NDA. I can just say it's good. It's not easy to make a good 64/66 PCI, and AMD made a good work. I expect the next pre-release to be even better. I will send the results to Greg to publish on his web site as soon as the NDA is over. I can also tell you that my next cluster will definitively be based on this machine. Patrick From math at velocet.ca Wed Nov 28 15:19:49 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:54 2009 Subject: custom hardware (was: Xbox clusters?) In-Reply-To: ; from dvos12@calvin.edu on Wed, Nov 28, 2001 at 05:11:09PM -0500 References: <3C054E04.83608604@gats-inc.com> Message-ID: <20011128181949.K1210@velocet.ca> On Wed, Nov 28, 2001 at 05:11:09PM -0500, David Vos's all... > On Wed, 28 Nov 2001, John Burton wrote: > > Ummmm....speak for yourself. I've been putting together these "self > > assembled beige box" for many years and currently have about 5% > > component DOA rate, and about another 1% infant mortality rate (crap > There is one computer in our cluster that would make me think twice before > doing a custom build. I prefer to call it the node from heck. It only > has one problem: it won't boot. If you press the power button, the > powerlight flashes while the cpu and case fans turn a quarter turn, then > nothing. You have to wait a minute before you even get that reaction > again. (Sounds like a short somewhere). The problem only surfaces if the > computer has been off for a little while, and nearly every time at that. > Since there is not a single piece of hardware that was present in each > case, I feel forced to conclude that there must be something (power cord?) > that is braking the power supplies. I have not seen this problem on any > other computers. This is the point at which I would love to put the whole > computer back in a box and send it to the reseller. I saw this EXTREMELY SIMILAR type of situation when I went and fried 3 power supplies in a row trying to boot dual athlons on the Tiger XMP board. :) They ran fine for 1-5 minutes then the power supply blew. Then the power supply would never fully turn on again, just a quarter turn of the fan kinda thing. Those were 300W supplies, and you need 350W's (30A min on +5V, the 300s were 25A, the 350s do 32A) to run the dual board. Now things are fine (enermax 350W supplies are nice). So it might be that... What kinda cpu, how many drives, how much ram and how big are your supplies? Anyway, this kind of event DOESNT exlcuse the XBOX from having these problems too, except you dont get to return it and you dont get to take it apart to see which particular component piece in which combination displays the problem. /kc -- Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA From beerli at genetics.washington.edu Wed Nov 28 17:03:46 2001 From: beerli at genetics.washington.edu (Peter Beerli) Date: Wed Nov 25 01:01:54 2009 Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties Message-ID: Hi, I have a program developed using MPI-1 under LAM. It runs fine on several LAM-MPI clusters with different architecture. A user wants to run it on a Scyld-beowulf cluster and there it fails. I did a few tests myself and it seems that the program stalls if run on more than 3 nodes, but seems to work for 2-3 nodes. The program has master-slaves architectures where the master is mostly doing nothing. There are some reports sent to stdout from any node (but this seems to work in beompi the same way as in LAM). There are several things unclear to me because I have no clue about the beompi system, beowulf and scyld in particular. (1) if I run "top" why do I see 6 processes running when I start with mpirun -np 3 migrate-n ? (2) The data-phase stalls on the slave nodes. The master node is reading the data from a file and then broadcasts a large char buffer to the slaves. Is this wrong, is there a better way to do that [I do not know how big the data is and it is a complex mix of strings numbers etc.] void broadcast_data_master (data_fmt * data, option_fmt * options) { long bufsize; char *buffer; buffer = (char *) calloc (1, sizeof (char)); bufsize = pack_databuffer (&buffer, data, options); MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); free (buffer); } void broadcast_data_worker (data_fmt * data, option_fmt * options) { long bufsize; char *buffer; MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); buffer = (char *) calloc (bufsize, sizeof (char)); MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); unpack_databuffer (buffer, data, options); free (buffer); } the master and the first node seem to read the data fine but the others either don't and wait or silently die. (3) what is the easiest way to debug this? With LAM I just attached to pids to in gdb on the different nodes, but here the nodes are transparent to me [but as I said I have never used a beowulf cluster before]. Can you give pointers, hints thanks Peter -- Peter Beerli, Genome Sciences, Box #357730, University of Washington, Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754 http://evolution.genetics.washington.edu/PBhtmls/beerli.html From daniel.pfenniger at obs.unige.ch Thu Nov 29 00:15:15 2001 From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger) Date: Wed Nov 25 01:01:54 2009 Subject: custom hardware (was: Xbox clusters?) References: Message-ID: <3C05EE93.955F54DF@obs.unige.ch> David Vos wrote: > .... > There is one computer in our cluster that would make me think twice before > doing a custom build. I prefer to call it the node from heck. It only > has one problem: it won't boot. If you press the power button, the > powerlight flashes while the cpu and case fans turn a quarter turn, then > nothing. You have to wait a minute before you even get that reaction > again. (Sounds like a short somewhere). The problem only surfaces if the > computer has been off for a little while, and nearly every time at that. I have seen similar strange behavior of some boxes in a set of 66's, and the way to restart is also rather odd. Basically, and this has been repeatedly observed on several boxes of the same composition (dual Pentium III with ASUS P2BD motherboard) aligned on a metallic shelf, the ATX box would stop after months of activity, and the simplest found way to restart it is to unplug everything (power and ethernet), touch it for a few seconds with hands, replug and voila. No need to open the box! My guess is that some condensator needs to be unloaded, but exactly why one needs to unplug every cable appears curious. Dan From rauch at inf.ethz.ch Thu Nov 29 02:42:31 2001 From: rauch at inf.ethz.ch (Felix Rauch) Date: Wed Nov 25 01:01:54 2009 Subject: Strange hardware (was Re: custom hardware (was: Xbox clusters?)) In-Reply-To: <3C05EE93.955F54DF@obs.unige.ch> Message-ID: On Thu, 29 Nov 2001, Daniel Pfenniger wrote: > I have seen similar strange behavior of some boxes in a set of 66's, > and the way to restart is also rather odd. [...] We recently had strange problems with a Dell-Box which has been working without problems for several years in our small research cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel was unable to start the second CPU. It could see the second CPU, but when it tried to start it up during boot, it got a timeout and so continued with only one CPU. So we though that one of the CPUs died and replaced both CPUs. Still the same problem. Next we replaced the motherboard (including the power suply). Still the same problem. Maybe the disk corrupted the kernel, so we installed a fresh version of the same kernel onto the box. Still the same problem. Only after physically replacing the SCSI hard disk everything was working properly again. We are still wondering why a disk could cause a CPU to timeout during boot... - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 From rcferri at us.ibm.com Thu Nov 29 04:09:20 2001 From: rcferri at us.ibm.com (Richard C Ferri) Date: Wed Nov 25 01:01:54 2009 Subject: Marist Beowulf Setup Message-ID: Hi, Can anyone take a look at Anthony's problem below and help a poor college student building a scyld cluster? thanks, Rich ---------------------- Forwarded by Richard C Ferri/Poughkeepsie/IBM on 11/29/2001 07:08 AM --------------------------- Anthony Sofia on 11/28/2001 12:52:46 PM To: Richard C Ferri/Poughkeepsie/IBM@IBMUS cc: Jose.Arreola@mairst.edu Subject: Marist Beowulf Setup I have a couple of problems/questions that you might be able to help with. (This is all based on scyld) The first problem is the beoserv and bpmater daemons are binding to -1 instead of an address(192.168.1.1). THe nodes are able to get their IP addresses via rarp, but when it tries to connect to the master node(192.168.1.1:1555) to get the second level boot image, the slave nodes stalls. When doing a netstat on the master node, it says an established tcp connection exsists between .-1:1555 and .0:(some port). During this, no data is being transfered over the network, so i am sceptical if the tcp connection actually exsists. I am going to start looking into this, but I thought you might have a quick answer that would make me not have to dig through code and strace output all afternoon. =) I think my other issues can be solved once i have this problem fixed. Thanks for any advice and suggestions you can give me. Anthony Sofia -- anthony@dryhump.net From Mark at MarkAndrewSmith.co.uk Thu Nov 29 05:50:58 2001 From: Mark at MarkAndrewSmith.co.uk (Mark@MarkAndrewSmith.co.uk) Date: Wed Nov 25 01:01:54 2009 Subject: Strange hardware (was Re: custom hardware (was: Xbox clusters?)) Message-ID: <61DC272A66B8D211BA8200105ADF2D3910E71C@SERVER01> Yep, seen this problem many times in our computer hire range of Windows2000Pro machines. The strange thing is that we only see this on Slot 1 Pentium II machines with various model motherboards. All our Pentium III range are socket 370 and no problems. So we came to a feeling that the problem was the way in which the Slot1 Pentium II sits on the motherboard. After months of clients returning equipment to base under warranty, we issued instruction on how to open the case and remove and re seat the PentiumII Slot 1 processor package. The machines then boot every time after switch on. How many of you having this problem have it with the slot 1 Pentium II and slot 2 Pentium III processors in your clusters? I bet none of you have it with a socket 370 or other "flat" socket type of CPU package. We're fortunate that our development cluster is based on Pentium 233MHz MMX "old" ex-hire equipment so we don't have this problem on the cluster. Yet! Regards, Mark. -----Original Message----- From: Felix Rauch [SMTP:rauch@inf.ethz.ch] Sent: Thursday 29 November 2001 12:00 To: beowulf@beowulf.org Subject: Strange hardware (was Re: custom hardware (was: Xbox clusters?)) On Thu, 29 Nov 2001, Daniel Pfenniger wrote: > I have seen similar strange behavior of some boxes in a set of 66's, > and the way to restart is also rather odd. [...] We recently had strange problems with a Dell-Box which has been working without problems for several years in our small research cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel was unable to start the second CPU. It could see the second CPU, but when it tried to start it up during boot, it got a timeout and so continued with only one CPU. So we though that one of the CPUs died and replaced both CPUs. Still the same problem. Next we replaced the motherboard (including the power suply). Still the same problem. Maybe the disk corrupted the kernel, so we installed a fresh version of the same kernel onto the box. Still the same problem. Only after physically replacing the SCSI hard disk everything was working properly again. We are still wondering why a disk could cause a CPU to timeout during boot... - Felix -- Felix Rauch | Email: rauch@inf.ethz.ch Institute for Computer Systems | Homepage: http://www.cs.inf.ethz.ch/~rauch/ ETH Zentrum / RZ H18 | Phone: ++41 1 632 7489 CH - 8092 Zuerich / Switzerland | Fax: ++41 1 632 1307 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20011129/db7b8362/attachment.html From becker at scyld.com Thu Nov 29 06:01:50 2001 From: becker at scyld.com (Donald Becker) Date: Wed Nov 25 01:01:54 2009 Subject: Marist Beowulf Setup In-Reply-To: Message-ID: On Thu, 29 Nov 2001, Richard C Ferri wrote: > Anthony Sofia on 11/28/2001 12:52:46 PM > > I have a couple of problems/questions that > you might be able to help with. (This is all based on scyld) > > The first problem is the beoserv and bpmater daemons are binding > to -1 instead of an address(192.168.1.1). The Scyld Beowulf system has special host names for cluster components. .0, .1 ... Compute (slave) nodes .-1 Front-end (master) nodes Note the leading ".", which makes this a hostname instead of a number. This hostname syntax is a valid local text hostname for library routines. It won't be misinterpreted as a valid Internet DNS hostname, or an integer which would be interpreted as an IP number. With this hostname form we can avoid the overhead or serialization of hostname lookups by algorithmically translating to an IP address. We parse the number and add it to the base IP address of the cluster nodes, usually 192.168.1.100. (Implementation note: the correct netmask is required for this to work with more than 154 hosts.) > THe nodes are able to get > their IP addresses via rarp, but when it tries to connect to > the master node(192.168.1.1:1555) to get the second level > boot image, the slave nodes stalls. The leading causes of this are A network problem Switches set to forced-full-duplex won't work because there is no way to set driver parameters during boot Report the device driver version and detection message. The driver errata list is always changing with the introduction of new, not-quite-compatible chips A version mismatch between the master and boot disks Due to a changes in the Scyld boot protocol, the boot floppy/CD-ROM must match the master. > When doing a netstat on the > master node, it says an established tcp connection exsists > between .-1:1555 and .0:(some port). During this, no data is > being transfered over the network, so i am sceptical if the > tcp connection actually exsists. Yes, netstat is accurately reporting the connection. An established connection indicates that at least a few packets got through. That reduces the likelihood of a device driver problem, but you might still have a bogus switch configuration. > I am going to start looking into this, but I thought you > might have a quick answer that would make me not have to > dig through code and strace output all afternoon. =) Using 'strace' likely won't be as useful as 'tcpdump'. But just monitoring network traffic with /proc/net/dev should give a good indication of what is occurring. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From SGaudet at turbotekcomputer.com Thu Nov 29 05:59:33 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? Message-ID: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> Hello, > > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all... > > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 > megabytes of memory, > > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard > disk for about > > > > $300. This would make it a reasonably powerful cluster > node with an > > > > excellent price/performance ratio. Of course, the thing runs a > > > > slimmed-down variant of Windows 2000 instead of Linux, > but has anyone > > > > discussed making an Xbox cluster? > > > > > > Why bother when for about $300 USD you can put together a > > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > > > > > Sides, who brought 'price/performance' onto this list? > Dont know thats never a > > > factor on the beowulf list? :) > > > > So, the question is, with these numbers, how do people end > up spending > > $250K on 40 or even 60-CPU clusters? > > A low cost system can be built when using MicroATX cases with 145w ps, costs $35.00 and up. For motherboards, I'd look at solid performers like Intel's D815EGEWLU and S815EBM1(1u bd). Here's the list of approved case options. http://www.formfactors.org/searchproducts.asp# Intel's motherbds have a 3 year warranty and don't have some flaky problems seen on clones. http://program.intel.com/shared/products/boards/d815egew/index.htm http://program.intel.com/shared/products/servers/boards/S815EBM1/index.htm The s815ebm1 is a slick motherbd, built for a 1u case and supports Tualatin, costs about $35.00 more than the d815egewlu. The nice thing is they both have video, fast ethernet, ATA100, etc... > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor. If you find a Compaq GEM partner(we are), your fall into Government, Educational, and Medical category, you can't beat the deals Compaq is offering right now. For New England they have a Evo D500, PIV 1.5Ghz, 845, 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a quantity they do even better on the price. FYI: This deal might be available elsewhere, don't know. Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== > From j.c.burton at gats-inc.com Thu Nov 29 06:46:49 2001 From: j.c.burton at gats-inc.com (John Burton) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> <3C05672C.E77C9DBF@myri.com> Message-ID: <3C064A59.F18CDAB2@gats-inc.com> Patrick Geoffray wrote: > Hi Paul, > > Paul Lu wrote: > > > To the extent that you can/are allowed, would you care to comment on > > how well these boards perform, especially wrt 64-bit/66 MHz Myrinet > > interfaces? > > Unfortunately, I cannot give this information, it's under NDA. > I can just say it's good. It's not easy to make a good 64/66 PCI, > and AMD made a good work. > I expect the next pre-release to be even better. > > I will send the results to Greg to publish on his web site as > soon as the NDA is over. > I can also tell you that my next cluster will definitively > be based on this machine. > Greetings! I am currently in the process of upgrading an existing cluster used for course grain processing (divide input data file into several chunks and process each chunk on seperate nodes). Each of the current nodes is a SuperMicro 6010H (SuperMicro 370DER motherboard, serverworks HE-SL chipset) with 2GB of memory and dual 1Ghz Pentium III processors. I'm looking at a 1U product, the AAPRO 1124 which has a Tyan motherboard with 2GB DDR RAM, dual Athlon MP 1800+ processors. Networking is/will be dual 10/100 FDX NICs in a channel bonded config. Does anyone have a feel for how the two systems compare (dual 1Ghz PIII vs dual Athlon 1800+). Also, will the AMD 760 MPX chipset be a significant enough improvement over the AMD 760MP to warrant waiting (how long???). And finally, since my supplier is a Tyan partner, its much easier to get Tyan boards - is Tyan coming out with a AMD 760 MPX based dual athlon motherboard? Inquiring minds want to know!!! John From bob at drzyzgula.org Thu Nov 29 07:02:00 2001 From: bob at drzyzgula.org (Bob Drzyzgula) Date: Wed Nov 25 01:01:54 2009 Subject: custom hardware (was: Xbox clusters?) In-Reply-To: <3C05EE93.955F54DF@obs.unige.ch>; from daniel.pfenniger@obs.unige.ch on Thu, Nov 29, 2001 at 09:15:15AM +0100 References: <3C05EE93.955F54DF@obs.unige.ch> Message-ID: <20011129100200.A14075@www.snappity.org> On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote: > > David Vos wrote: > > > .... > > There is one computer in our cluster that would make me think twice before > > doing a custom build. I prefer to call it the node from heck. It only > > has one problem: it won't boot. If you press the power button, the > > powerlight flashes while the cpu and case fans turn a quarter turn, then > > nothing. You have to wait a minute before you even get that reaction > > again. (Sounds like a short somewhere). The problem only surfaces if the > > computer has been off for a little while, and nearly every time at that. > > I have seen similar strange behavior of some boxes in a set of 66's, and the > way to restart is also rather odd. > Basically, and this has been repeatedly observed on several boxes of the same > composition (dual Pentium III with ASUS P2BD motherboard) aligned on a metallic > shelf, the ATX box would stop after months of activity, and the simplest found > way to restart it is to unplug everything (power and ethernet), touch it for > a few seconds with hands, replug and voila. No need to open the box! > My guess is that some condensator needs to be unloaded, but exactly why > one needs to unplug every cable appears curious. One thing to understand is that, unless there is a physical switch on the power supply itself, ATX systems are never *really* turned off as long as they are plugged in -- they only go to a "standby" state, wherein +5V power is still being applied to a single pin (the purple wire). When you press the power button on the front of the chassis, it merely shorts a header that ultimately causes the motherboard to short the green wire in the ATX cable to ground -- this is a signal to the power supply to leave standby and start generating power for all the other outputs. Another thing to observe is that generally, ATX power supplies are switching supplies, which means that (to simplify things somewhat) they generate the correct voltage by charging and discharging a capacitor at a high rate. The switching controller constantly monitors the voltage on the capacitor and connects or disconnects the capacitor to the incoming supply, depending on whether the charge is above or below the desired level (the detailed truth behind this is fairly complex and typically involves multiple stages and inductors as well as capacitors, but this model is probably good enough for this discussion...). Thus, even when an ATX system is "off", the power supply is chugging along, keeping a capacitor charged to provide +5V at a low current. BTW, if you have the resources to do this, put a current sensor on the incoming AC line for a running system and feed the output to an oscilloscope. You should see a series of alternating positive and negative spikes -- those are the capacitors charging at the peaks and troughs of the AC voltage. Now, if the ATX board were simply to run the green-wire contact straight through to the power on/off header, you wouldn't need much oomph at all on the +5V standby line, and older ATX power supplies in fact didn't. However, newer boards have things like Wake-on-LAN, Wake-on-Modem, and other various and sundry goodies that have to run off the +5V standby. It has gotten to the point that, in order to do all the processing that is required to leave standby, the standby current draw is greater than what some older supplies can provide. So in the case of a power supply that either by design or fault cannot provide sufficient current under standby, what (I think) happens is that while the motherboard is waiting for the main supply voltages to come up to full power, the standby processing bleeds off the capacitor to the point that the standby voltage sags below the minimum required for operation. At that point, the standby processing halts, the motherboard stops holding the green wire to ground, and the power supply stops trying to power up. It then returns to standby mode, re-charges the standby capacitor, and the cycle begins again. If you have a system that is behaving like this, try putting a voltmeter on the standby pin of the ATX header (you can usually jab a probe down into the back of the connector). You should see it at +5V when the system is "off". Then press the system's "on" button and watch the voltage. You'll most likely see it sag down to a couple of volts or so. If this doesn't happen, you've probably got some other problem, perhaps a POST failure of some sort. Also, this may not be the end of the diagnosis -- it is possible that the failure to provide enough current on standby may not be the fault of the power supply itself. It could be a faulty componant (e.g. the SCSI drive we heard about) sucking down too much current on power-up, or an overburdened AC supply circuit that sags just a bit when your system starts up -- in the latter case I imagine that you could wind up with a seemingly jinxed spot in the equipment rack. :-) BTW, if the power supply has too little oomph on standby by *design*, the system will probably *never* power up. If the supply's design meets the new spec only marginally, or if it is malfunctioning, say, because of a damaged or weakened capacitor, then it might behave differently when cold than it does when it is fully warmed up. In this event, unplugging the supply for a while and reconnecting it can create a short window in which the supply can get the system over the hump to leave standby. I in fact have a supply at home that has this problem, and I just sort of live with it because it's not my main system. Someday perhaps I'll replace the supply. As to why you have to disconnect the Ethernet as well, I really don't have a clue. HTH, --Bob Drzyzgula From lindahl at conservativecomputer.com Thu Nov 29 07:24:55 2001 From: lindahl at conservativecomputer.com (Greg Lindahl) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? In-Reply-To: <3C064A59.F18CDAB2@gats-inc.com>; from j.c.burton@gats-inc.com on Thu, Nov 29, 2001 at 09:46:49AM -0500 References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> <3C05672C.E77C9DBF@myri.com> <3C064A59.F18CDAB2@gats-inc.com> Message-ID: <20011129102455.A3390@wumpus.foo> On Thu, Nov 29, 2001 at 09:46:49AM -0500, John Burton wrote: > Also, will the AMD 760 MPX chipset be a significant enough > improvement over the AMD 760MP to warrant waiting (how long???). The main improvement of the MPX chipset over the MP is better PCI bandwidth. Given that you are only using bonded fast Ethernet, you won't notice a difference. The reason people care about PCI bandwidth are things like Myrinet and SCSI/IDE RAID, which need a lot of bandwidth. greg From alvin at iplink.net Thu Nov 29 08:06:58 2001 From: alvin at iplink.net (alvin) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? References: Message-ID: <3C065D22.C554211B@iplink.net> "Robert G. Brown" wrote: > > On Wed, 28 Nov 2001, Velocet wrote: > > > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all... > > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all... > > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > > > $300. This would make it a reasonably powerful cluster node with an > > > > excellent price/performance ratio. Of course, the thing runs a > > > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > > > discussed making an Xbox cluster? > > > > > > Why bother when for about $300 USD you can put together a > > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram? > > > > > > Sides, who brought 'price/performance' onto this list? Dont know thats never a > > > factor on the beowulf list? :) > > > > So, the question is, with these numbers, how do people end up spending > > $250K on 40 or even 60-CPU clusters? > > Well, start with $300 rackmount cases (a rackmount case alone can easily > cost more than an Xbox). Add a high end P4 motherboard, the fastest > P4-Xeon, and fully populate the MoBo with the biggest, most expensive > RDRAM sticks you can find. Get a big, fast SCSI drive and controller. > Finish off with the fastest network you can arrange. [snip] > > This is the Lesson of the Wang. > > (At least for those of you old enough to remember what one is...:-) To put on my humrous hat. I understand that MS is loosing cash on each Xbox they sell. Possibly they are looking to do somthing like Kodak did in it early days where they sold cameras much cheaper the it cost to produce them so that they could make it up on the film. Well with the exception that MS want to sell software. Possibly we should everybody should go out an buy an Xbox. If we buy enough then we may be able to put the EVIL EMPIRE out of business. And If we all install Linux on the Xboxes then MS will lose out on the ongoing SW sales. -- Alvin Starr || voice: (416)785-4051 Interlink Connectivity || fax: (416)785-3668 alvin@iplink.net || From jlong at arsc.edu Thu Nov 29 09:43:52 2001 From: jlong at arsc.edu (James Long) Date: Wed Nov 25 01:01:54 2009 Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties In-Reply-To: References: Message-ID: The buffer allocation is only one byte in broadcast_data_master. Looks like you should make it big enough for all your data and options before you broadcast it, as there is no telling what might stomp that memory after you pack it and before it gets sent. Jim At 5:03 PM -0800 11/28/01, Peter Beerli wrote: >Hi, >I have a program developed using MPI-1 under LAM. >It runs fine on several LAM-MPI clusters with different architecture. >A user wants to run it on a Scyld-beowulf cluster and there it fails. >I did a few tests myself and it seems >that the program stalls if run on more than 3 nodes, but seems to work for >2-3 nodes. The program has master-slaves architectures where the master >is mostly doing nothing. There are some reports sent to stdout from any node >(but this seems to work in beompi the same way as in LAM). >There are several things unclear to me >because I have no clue about the beompi system, beowulf and scyld in >particular. > >(1) if I run "top" why do I see 6 processes running when I start > with mpirun -np 3 migrate-n ? > >(2) The data-phase stalls on the slave nodes. > The master node is reading the data from a file and then broadcasts > a large char buffer to the slaves. Is this wrong, is there a better way > to do that [I do not know how big the data is and it is a complex mix > of strings numbers etc.] > >void >broadcast_data_master (data_fmt * data, option_fmt * options) >{ > long bufsize; > char *buffer; > buffer = (char *) calloc (1, sizeof (char)); > bufsize = pack_databuffer (&buffer, data, options); > MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); > MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); > free (buffer); >} > >void >broadcast_data_worker (data_fmt * data, option_fmt * options) >{ > long bufsize; > char *buffer; > MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); > buffer = (char *) calloc (bufsize, sizeof (char)); > MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); > unpack_databuffer (buffer, data, options); > free (buffer); >} > > the master and the first node seem to read the data fine > but the others either don't and wait or silently die. > >(3) what is the easiest way to debug this? With LAM I just attached to pids to > in gdb on the different nodes, but here the nodes are transparent to me > [but as I said I have never used a beowulf cluster before]. > > >Can you give pointers, hints > >thanks >Peter >-- >Peter Beerli, Genome Sciences, Box #357730, University of Washington, >Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754 >http://evolution.genetics.washington.edu/PBhtmls/beerli.html > > > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% James Long MPP Specialist Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks, AK 99775-6020 jlong@arsc.edu (907) 474-5731 work (907) 474-5494 fax %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% From jonathan at meanwhile.freeserve.co.uk Thu Nov 29 10:27:04 2001 From: jonathan at meanwhile.freeserve.co.uk (Jonathan Coupe) Date: Wed Nov 25 01:01:54 2009 Subject: Re. XBox clusters Message-ID: <001a01c17903$7294c620$2901893e@baby> ----- Original Message ----- From: "Josip Loncaric" To: "Beowulf mailing list" Sent: Wednesday, November 28, 2001 5:50 PM Subject: Xbox clusters? > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > $300. This would make it a reasonably powerful cluster node with an > excellent price/performance ratio. Of course, the thing runs a > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > discussed making an Xbox cluster? > > Sincerely, > Josip > I remember people speculating in a similar way re. the Dreamcast. (I did.) In practice I doubt that a game console will ever be a better bet for clustering than a PC. Firstly, most of the transistor budget goes into the 3D card, where it's efectively useless for us. Secondly, PC's track the price of cpu's, etc, much more quickly than consoles. If a console was *really* heavily subsidised by its maker - consoles usually are subsidised at luanch time - it could start cheaper than the PC. But in a few months it would have lost this price advantage. - Jonathan Coupe From joelja at darkwing.uoregon.edu Thu Nov 29 10:33:54 2001 From: joelja at darkwing.uoregon.edu (Joel Jaeggli) Date: Wed Nov 25 01:01:54 2009 Subject: Re. XBox clusters In-Reply-To: <001a01c17903$7294c620$2901893e@baby> Message-ID: I'd also note that in my latest pc-connection catalog... 1.1ghz celerons with 128MB of ram and 20GB drives and nics from compaq are $499. Myself I prefer to build them rather than use srinkwrap pc's but passable boxes with warranties are out there... joelja On Thu, 29 Nov 2001, Jonathan Coupe wrote: > ----- Original Message ----- > From: "Josip Loncaric" > To: "Beowulf mailing list" > Sent: Wednesday, November 28, 2001 5:50 PM > Subject: Xbox clusters? > > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory, > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about > > $300. This would make it a reasonably powerful cluster node with an > > excellent price/performance ratio. Of course, the thing runs a > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone > > discussed making an Xbox cluster? > > > > Sincerely, > > Josip > > > > I remember people speculating in a similar way re. the Dreamcast. (I did.) > In practice I doubt that a game console will ever be a better bet for > clustering than a PC. Firstly, most of the transistor budget goes into the > 3D card, where it's efectively useless for us. Secondly, PC's track the > price of cpu's, etc, much more quickly than consoles. If a console was > *really* heavily subsidised by its maker - consoles usually are subsidised > at luanch time - it could start cheaper than the PC. But in a few months it > would have lost this price advantage. > > - Jonathan Coupe > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- -------------------------------------------------------------------------- Joel Jaeggli joelja@darkwing.uoregon.edu Academic User Services consult@gladstone.uoregon.edu PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E -------------------------------------------------------------------------- It is clear that the arm of criticism cannot replace the criticism of arms. Karl Marx -- Introduction to the critique of Hegel's Philosophy of the right, 1843. From SGaudet at turbotekcomputer.com Thu Nov 29 10:39:02 2001 From: SGaudet at turbotekcomputer.com (Steve Gaudet) Date: Wed Nov 25 01:01:54 2009 Subject: AMD 760 MPX ? Message-ID: <3450CC8673CFD411A24700105A618BD6170FB0@911TURBO> Hello, > Greetings! I am currently in the process of upgrading an > existing cluster used for course grain processing > (divide input data file into several chunks and process each > chunk on seperate nodes). Each of the current nodes > is a SuperMicro 6010H (SuperMicro 370DER motherboard, > serverworks HE-SL chipset) with 2GB of memory and dual 1Ghz > Pentium III processors. I'm looking at a 1U product, the > AAPRO 1124 which has a Tyan motherboard with 2GB DDR > RAM, dual Athlon MP 1800+ processors. Networking is/will be > dual 10/100 FDX NICs in a channel bonded config. > Does anyone have a feel for how the two systems compare (dual > 1Ghz PIII vs dual Athlon 1800+). Also, will the > AMD 760 MPX chipset be a significant enough improvement over > the AMD 760MP to warrant waiting (how long???). And > finally, since my supplier is a Tyan partner, its much easier > to get Tyan boards - is Tyan coming out with a AMD > 760 MPX based dual athlon motherboard? Inquiring minds want > to know!!! Just in from Tyan. The 2466 (Tiger) will be available for sampling starting next week. The 2468 (Thunder) will be available most likely the beginning of January. Cheers, Steve Gaudet Linux Solutions Engineer ..... <(???)> =================================================================== | Turbotek Computer Corp. tel:603-666-3062 ext. 21 | | 8025 South Willow St. fax:603-666-4519 | | Building 2, Unit 105 toll free:800-573-5393 | | Manchester, NH 03103 e-mail:sgaudet@turbotekcomputer.com | | web: http://www.turbotekcomputer.com | =================================================================== From jharrop at shaw.ca Thu Nov 29 11:20:58 2001 From: jharrop at shaw.ca (J Harrop) Date: Wed Nov 25 01:01:54 2009 Subject: custom hardware (was: Xbox clusters?) In-Reply-To: <20011129100200.A14075@www.snappity.org> References: <"from daniel.pfenniger"@obs.unige.ch> <3C05EE93.955F54DF@obs.unige.ch> Message-ID: <5.0.2.1.0.20011129110501.009f2a30@shawmail> We have had similar problems over the years, some of which we tracked down to poor grounding conditions in the building wiring. I know one location where the weather (in particular rain) can affect the behavior of some of the system. I expect the grounding problem would create problems with similar symptoms on the newer power supplies - but I cant give a detailed explanation such as the excellent one posted. I seem to recall that we also had this problem with the older power supplies. Solution was the same - unplug, wait, reboot. My favorite hardware problem was when I was working down in Honduras. One of the laptops became more and more flaky and finally quit booting at all. When I swapped out the CD-ROM module to try and boot from a floppy I found a stray ant sitting on the inside edge of the connector! On further inspection the inside of the laptop turned out to be packed with them. I wanted to duct-tape the machine closed and mail the box back to Dell with a "bug report" taped on it ;-) John Harrop At 10:02 AM 29/11/2001 -0500, you wrote: >On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote: > > > > David Vos wrote: > > > > > .... > > > There is one computer in our cluster that would make me think twice > before > > > doing a custom build. I prefer to call it the node from heck. It only > > > has one problem: it won't boot. If you press the power button, the > > > powerlight flashes while the cpu and case fans turn a quarter turn, then > > > nothing. You have to wait a minute before you even get that reaction > > > again. (Sounds like a short somewhere). The problem only surfaces > if the > > > computer has been off for a little while, and nearly every time at that. > > > > I have seen similar strange behavior of some boxes in a set of 66's, > and the > > way to restart is also rather odd. > > Basically, and this has been repeatedly observed on several boxes of > the same > > composition (dual Pentium III with ASUS P2BD motherboard) aligned on a > metallic > > shelf, the ATX box would stop after months of activity, and the > simplest found > > way to restart it is to unplug everything (power and ethernet), touch > it for > > a few seconds with hands, replug and voila. No need to open the box! > > My guess is that some condensator needs to be unloaded, but exactly why > > one needs to unplug every cable appears curious. > >One thing to understand is that, unless there is a physical >switch on the power supply itself, ATX systems are never >*really* turned off as long as they are plugged in -- they >only go to a "standby" state, wherein +5V power is still >being applied to a single pin (the purple wire). When you >press the power button on the front of the chassis, it >merely shorts a header that ultimately causes the >motherboard to short the green wire in the ATX cable to >ground -- this is a signal to the power supply to leave >standby and start generating power for all the other >outputs. > >Another thing to observe is that generally, ATX power >supplies are switching supplies, which means that (to >simplify things somewhat) they generate the correct voltage >by charging and discharging a capacitor at a high rate. The >switching controller constantly monitors the voltage on the >capacitor and connects or disconnects the capacitor to the >incoming supply, depending on whether the charge is above or >below the desired level (the detailed truth behind this is >fairly complex and typically involves multiple stages and >inductors as well as capacitors, but this model is probably >good enough for this discussion...). Thus, even when an ATX >system is "off", the power supply is chugging along, keeping >a capacitor charged to provide +5V at a low current. BTW, if >you have the resources to do this, put a current sensor on >the incoming AC line for a running system and feed the >output to an oscilloscope. You should see a series of >alternating positive and negative spikes -- those are the >capacitors charging at the peaks and troughs of the AC >voltage. > >Now, if the ATX board were simply to run the green-wire >contact straight through to the power on/off header, you >wouldn't need much oomph at all on the +5V standby line, and >older ATX power supplies in fact didn't. However, newer >boards have things like Wake-on-LAN, Wake-on-Modem, and >other various and sundry goodies that have to run off the >+5V standby. It has gotten to the point that, in order to >do all the processing that is required to leave standby, the >standby current draw is greater than what some older >supplies can provide. So in the case of a power supply that >either by design or fault cannot provide sufficient current >under standby, what (I think) happens is that while the >motherboard is waiting for the main supply voltages to come >up to full power, the standby processing bleeds off the >capacitor to the point that the standby voltage sags below >the minimum required for operation. At that point, the >standby processing halts, the motherboard stops holding the >green wire to ground, and the power supply stops trying to >power up. It then returns to standby mode, re-charges the >standby capacitor, and the cycle begins again. > >If you have a system that is behaving like this, try putting >a voltmeter on the standby pin of the ATX header (you can >usually jab a probe down into the back of the connector). >You should see it at +5V when the system is "off". Then >press the system's "on" button and watch the voltage. You'll >most likely see it sag down to a couple of volts or so. If >this doesn't happen, you've probably got some other problem, >perhaps a POST failure of some sort. Also, this may not be >the end of the diagnosis -- it is possible that the failure >to provide enough current on standby may not be the fault of >the power supply itself. It could be a faulty componant >(e.g. the SCSI drive we heard about) sucking down too much >current on power-up, or an overburdened AC supply circuit >that sags just a bit when your system starts up -- in the >latter case I imagine that you could wind up with a >seemingly jinxed spot in the equipment rack. :-) > >BTW, if the power supply has too little oomph on standby by >*design*, the system will probably *never* power up. If the >supply's design meets the new spec only marginally, or if it >is malfunctioning, say, because of a damaged or weakened >capacitor, then it might behave differently when cold than >it does when it is fully warmed up. In this event, >unplugging the supply for a while and reconnecting it can >create a short window in which the supply can get the system >over the hump to leave standby. I in fact have a supply at >home that has this problem, and I just sort of live with it >because it's not my main system. Someday perhaps I'll >replace the supply. > >As to why you have to disconnect the Ethernet as well, I >really don't have a clue. > >HTH, >--Bob Drzyzgula >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf From beerli at genetics.washington.edu Thu Nov 29 13:19:33 2001 From: beerli at genetics.washington.edu (Peter Beerli) Date: Wed Nov 25 01:01:54 2009 Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties In-Reply-To: Message-ID: Jim, the buffer in broadcast_data_master gets allocated to the size needed in pack_data_buffer() [which returns the allocated size of buffer] before the buffer is broadcasted. Peter On Thu, 29 Nov 2001, James Long wrote: > The buffer allocation is only one byte in broadcast_data_master. > Looks like you should make it big enough for all your data and > options before you broadcast it, as there is no telling what might > stomp that memory after you pack it and before it gets sent. > > Jim > > At 5:03 PM -0800 11/28/01, Peter Beerli wrote: > >Hi, > >I have a program developed using MPI-1 under LAM. > >It runs fine on several LAM-MPI clusters with different architecture. > >A user wants to run it on a Scyld-beowulf cluster and there it fails. > >I did a few tests myself and it seems > >that the program stalls if run on more than 3 nodes, but seems to work for > >2-3 nodes. The program has master-slaves architectures where the master > >is mostly doing nothing. There are some reports sent to stdout from any node > >(but this seems to work in beompi the same way as in LAM). > >There are several things unclear to me > >because I have no clue about the beompi system, beowulf and scyld in > >particular. > > > >(1) if I run "top" why do I see 6 processes running when I start > > with mpirun -np 3 migrate-n ? > > > >(2) The data-phase stalls on the slave nodes. > > The master node is reading the data from a file and then broadcasts > > a large char buffer to the slaves. Is this wrong, is there a better way > > to do that [I do not know how big the data is and it is a complex mix > > of strings numbers etc.] > > > >void > >broadcast_data_master (data_fmt * data, option_fmt * options) > >{ > > long bufsize; > > char *buffer; > > buffer = (char *) calloc (1, sizeof (char)); > > bufsize = pack_databuffer (&buffer, data, options); > > MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); > > MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); > > free (buffer); > >} > > > >void > >broadcast_data_worker (data_fmt * data, option_fmt * options) > >{ > > long bufsize; > > char *buffer; > > MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world); > > buffer = (char *) calloc (bufsize, sizeof (char)); > > MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world); > > unpack_databuffer (buffer, data, options); > > free (buffer); > >} > > > > the master and the first node seem to read the data fine > > but the others either don't and wait or silently die. > > > >(3) what is the easiest way to debug this? With LAM I just attached to pids to > > in gdb on the different nodes, but here the nodes are transparent to me > > [but as I said I have never used a beowulf cluster before]. > > > > > >Can you give pointers, hints > > > >thanks > >Peter > >-- > >Peter Beerli, Genome Sciences, Box #357730, University of Washington, > >Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754 > >http://evolution.genetics.washington.edu/PBhtmls/beerli.html > > > > > > > >_______________________________________________ > >Beowulf mailing list, Beowulf@beowulf.org > >To change your subscription (digest mode or unsubscribe) visit > >http://www.beowulf.org/mailman/listinfo/beowulf > > -- Peter Beerli, Genome Sciences, Box #357730, University of Washington, Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754 http://evolution.genetics.washington.edu/PBhtmls/beerli.html From wsb at paralleldata.com Thu Nov 29 13:45:31 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:54 2009 Subject: Xbox clusters? References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> Message-ID: <3C06AC7B.77FFC84A@paralleldata.com> Steve Gaudet wrote: > > > > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor. > > If you find a Compaq GEM partner(we are), your fall into Government, > Educational, and Medical category, you can't beat the deals Compaq is > offering right now. For New England they have a Evo D500, PIV 1.5Ghz, 845, > 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a > quantity they do even better on the price. > Wonder why medical? That's big business. I'm in business to make money with clusters so I guess I wouldn't qualify for that program. However, I can build an equivalent node for less than $500. (Skipping the CD and win2k which I have no use for) d845wnl $130 P4 1.5ghz $152 Case/PS $30 20GB disk $63 256MB dimm $30 AGP card $20 ============== total $425 Shipping would be around $35 delivered to your door. All you need is a screw driver to assemble... The d845wnl has 10/100 built in and is PXE bootable. If you like P4 1.9Ghz systems, add $120 and you have a screaming node for $545. (if you like P4's for your codes) It's amazing how cheap nodes are now. Wes From agrajag at scyld.com Thu Nov 29 15:46:10 2001 From: agrajag at scyld.com (Sean Dilda) Date: Wed Nov 25 01:01:55 2009 Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties In-Reply-To: ; from beerli@genetics.washington.edu on Wed, Nov 28, 2001 at 05:03:46PM -0800 References: Message-ID: <20011129184610.C17892@blueraja.scyld.com> On Wed, 28 Nov 2001, Peter Beerli wrote: > (1) if I run "top" why do I see 6 processes running when I start > with mpirun -np 3 migrate-n ? Two per node. For every process your want running, it also runs another one to take care of the MPI network I/O. Our MPI is based off of mpich, and this is how they have it setup. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 232 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20011129/ae2ddf85/attachment.bin From bill at math.ucdavis.edu Thu Nov 29 20:34:02 2001 From: bill at math.ucdavis.edu (Bill Broadley) Date: Wed Nov 25 01:01:55 2009 Subject: MPI I/O + nfs Message-ID: <20011129203402.A26613@sphere.math.ucdavis.edu> I'm trying to get MPICH-1.2.2.3 MPI I/O + nfs working. I read: http://www-unix.mcs.anl.gov/mpi/mpich/docs/install/node31.htm Step 1: ~/private/io> /usr/sbin/rpcinfo -p `hostname` | grep nfs 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs I'm using clients n1 and n2: n2:~> mount | grep noac master:/d0 on /d0 type nfs (rw,nfsvers=3,noac,addr=192.168.0.250) n1:~> mount | grep noac master:/d0 on /d0 type nfs (rw,nfsvers=3,noac,addr=192.168.0.250) Just to make absolutely sure I'm using nfs 3 I ran nfstats, I ran on n1 and n2 (same result): Client nfs v2: null getattr setattr root lookup readlink 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% read wrcache write create remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% link symlink mkdir rmdir readdir fsstat 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% Client nfs v3: null getattr setattr lookup access readlink 0 0% 222540 54% 83 0% 10010 2% 52 0% 53 0% read write create mkdir symlink mknod 67772 16% 103571 25% 2070 0% 2 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 2068 0% 2 0% 0 0% 0 0% 172 0% 0 0% fsstat fsinfo pathconf commit 356 0% 356 0% 0 0% 1372 0% When running a very simple MPI I/O example I stil get: File locking failed in ADIOI_Set_lock. If the file system is NFS, you need to use NFS version 3 and mount the directory with the 'noac' option (no attribute caching). Anyone have any ideas? Anyone know of an MPICH mailing list? Additional info: n1:~> uname -a Linux n1 2.4.9 #5 SMP Wed Sep 26 19:59:17 GMT-7 2001 i686 unknown n2:~> uname -a Linux n2 2.4.9 #5 SMP Wed Sep 26 19:59:17 GMT-7 2001 i686 unknown -- Bill Broadley Mathematics/Institute of Theoretical Dynamics UC Davis From TIMOTHY.R.WAIT at saic.com Wed Nov 28 12:07:47 2001 From: TIMOTHY.R.WAIT at saic.com (Tim Wait) Date: Wed Nov 25 01:01:55 2009 Subject: Xbox clusters? References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128144018.G1210@velocet.ca> Message-ID: <3C054413.3030004@apo.saic.com> > So, the question is, with these numbers, how do people end up spending > $250K on 40 or even 60-CPU clusters? > Um, high speed interconnect at $1500/box, quality components, >=512 MB per proc, rackmounts, big h/w raid storage, A/C... tim From schweng at master2.astro.unibas.ch Thu Nov 29 05:24:30 2001 From: schweng at master2.astro.unibas.ch (Hans Schwengeler) Date: Wed Nov 25 01:01:55 2009 Subject: Portland High Performance Fortran pghpf on Scyld cluster Message-ID: <200111291324.OAA07606@master2.astro.unibas.ch> Hello, I want to use pghpf on our new Scyld cluster (b27-8). pgf77 and pgf90 work ok, but pghpf appears to hang during execution of the resulting program. First trial was to point /usr/local/mpi/lib to /usr/lib/, second try was building mpich-1.2.1 (from the Scyld ftp site after applying the patches). Both have the result that f77 and f90 work, but NOT pghpf. I also tried the advice from the pgi FAQ and replaced mpi.o in /usr/local/pgi/linux86/lib/libpghpf_mpi.a but to no avail. Test program is /home/schweng/util/mpich-1.2.1-6.6.beo/mpich-1.2.1/installtest/pi3.f. /usr/local/bin/mpirun -np 2 pi3 Process 0 of 2 is alive Enter the number of intervals: (0 quits) <-- here it hangs, i.e. Process 1 comes never to live. Yours, Hans Schwengeler. From matz at wsunix.wsu.edu Fri Nov 30 10:59:14 2001 From: matz at wsunix.wsu.edu (Phillip D. Matz) Date: Wed Nov 25 01:01:55 2009 Subject: time command defaults changed in RedHat 7.2 vs RedHat 6.2? Message-ID: <003201c179d1$1aefe660$b4297986@chem.wsu.edu> I am used to keeping track of the actual time (elapsed) a job takes to complete on my cluster with the command line option "time" in RedHat 6.2. Recently I reinstalled RedHat 7.2 and now the "time" command yields different results (as if the portable option "-p" is always on). The man pages only help to tell me why the output looks the way it does, but doesn't tell me how to change the default back to what it looks like in a 6.2 installation. Does anyone know which file I need to modify to make the time command report the total elapsed time and not have the output be in the portable format? Thanks! Phil Matz From rlatham at plogic.com Fri Nov 30 11:22:49 2001 From: rlatham at plogic.com (Rob Latham) Date: Wed Nov 25 01:01:55 2009 Subject: MPI I/O + nfs In-Reply-To: <20011129203402.A26613@sphere.math.ucdavis.edu>; from bill@math.ucdavis.edu on Thu, Nov 29, 2001 at 08:34:02PM -0800 References: <20011129203402.A26613@sphere.math.ucdavis.edu> Message-ID: <20011130142249.K10306@otto.plogic.internal> On Thu, Nov 29, 2001 at 08:34:02PM -0800, Bill Broadley wrote: > > I'm trying to get MPICH-1.2.2.3 MPI I/O + nfs working. If you want ROMIO ( MPI I/O ), i strongly suggest using pvfs as the "back end" for your file system. In the few cases i know of where a customer used nfs as the back end, performance was downright poor ( as should be expected when you have to turn off all the caching ). start here: http://parlweb.parl.clemson.edu/pvfs/index.html ==rob -- [ Rob Latham Developer, Admin, Alchemist ] [ Paralogic Inc. - www.plogic.com ] [ ] [ EAE8 DE90 85BB 526F 3181 1FCF 51C4 B6CB 08CC 0897 ] From lmeerkat at yahoo.com Fri Nov 30 09:31:07 2001 From: lmeerkat at yahoo.com (L. Gritsenko) Date: Wed Nov 25 01:01:55 2009 Subject: Scyld boot problem Message-ID: <20011130173107.49716.qmail@web20609.mail.yahoo.com> Maybe this will be helpful: http://www.beowulf.org/pipermail/beowulf/2001-August/001057.html ===== __________________________________________________ Do You Yahoo!? Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1 From math at velocet.ca Fri Nov 30 14:37:53 2001 From: math at velocet.ca (Velocet) Date: Wed Nov 25 01:01:55 2009 Subject: Xbox clusters? In-Reply-To: <3C06AC7B.77FFC84A@paralleldata.com>; from wsb@paralleldata.com on Thu, Nov 29, 2001 at 03:45:31PM -0600 References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> <3C06AC7B.77FFC84A@paralleldata.com> Message-ID: <20011130173753.B1210@velocet.ca> On Thu, Nov 29, 2001 at 03:45:31PM -0600, W Bauske's all... > Steve Gaudet wrote: > > > > > > > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor. > > > > If you find a Compaq GEM partner(we are), your fall into Government, > > Educational, and Medical category, you can't beat the deals Compaq is > > offering right now. For New England they have a Evo D500, PIV 1.5Ghz, 845, > > 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a > > quantity they do even better on the price. > > > > Wonder why medical? That's big business. > > I'm in business to make money with clusters so I guess I wouldn't qualify > for that program. However, I can build an equivalent node for less than > $500. (Skipping the CD and win2k which I have no use for) > > d845wnl $130 > P4 1.5ghz $152 > Case/PS $30 > 20GB disk $63 > 256MB dimm $30 > AGP card $20 > ============== > total $425 > > Shipping would be around $35 delivered to your door. All you need is > a screw driver to assemble... > > The d845wnl has 10/100 built in and is PXE bootable. Any athlon boards with new chipsets that are PXE bootable? The PcChips M817 MLR has that, but its not a great board, and old chipset. /kc From wsb at paralleldata.com Fri Nov 30 15:47:14 2001 From: wsb at paralleldata.com (W Bauske) Date: Wed Nov 25 01:01:55 2009 Subject: Xbox clusters? References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> <3C06AC7B.77FFC84A@paralleldata.com> <20011130173753.B1210@velocet.ca> Message-ID: <3C081A82.F1B4436B@paralleldata.com> I PXE boot my tiger MP's (s2460) with Intel pro/100 pci adapters. Adapters go for about $27 which I thought was fair to allow me to boot/install without a floppy or CD. The floppy and CD combined are more than that typically. The boards I've used that have built-in Enet for Athlon have used some sort of Netware boot capability which I know nothing about. (K7S5A I think) Wes Velocet wrote: > > On Thu, Nov 29, 2001 at 03:45:31PM -0600, W Bauske's all... > > Steve Gaudet wrote: > > > > > > > > > > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor. > > > > > > If you find a Compaq GEM partner(we are), your fall into Government, > > > Educational, and Medical category, you can't beat the deals Compaq is > > > offering right now. For New England they have a Evo D500, PIV 1.5Ghz, 845, > > > 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a > > > quantity they do even better on the price. > > > > > > > Wonder why medical? That's big business. > > > > I'm in business to make money with clusters so I guess I wouldn't qualify > > for that program. However, I can build an equivalent node for less than > > $500. (Skipping the CD and win2k which I have no use for) > > > > d845wnl $130 > > P4 1.5ghz $152 > > Case/PS $30 > > 20GB disk $63 > > 256MB dimm $30 > > AGP card $20 > > ============== > > total $425 > > > > Shipping would be around $35 delivered to your door. All you need is > > a screw driver to assemble... > > > > The d845wnl has 10/100 built in and is PXE bootable. > > Any athlon boards with new chipsets that are PXE bootable? > > The PcChips M817 MLR has that, but its not a great board, and old chipset. > > /kc > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From ron_chen_123 at yahoo.com Fri Nov 30 19:41:47 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:55 2009 Subject: MPI I/O + nfs In-Reply-To: <20011129203402.A26613@sphere.math.ucdavis.edu> Message-ID: <20011201034147.26940.qmail@web14703.mail.yahoo.com> There is no MPICH mailing-list. You can email the MPICH developers directly. On the other hand, you may check the LAM MPI mailing list, may be they have encountered similar problems before: http://www.lam-mpi.org/mailman/listinfo.cgi/lam-announce -Ron --- Bill Broadley wrote: > Anyone have any ideas? Anyone know of an MPICH > mailing list? __________________________________________________ Do You Yahoo!? Buy the perfect holiday gifts at Yahoo! Shopping. http://shopping.yahoo.com From ron_chen_123 at yahoo.com Fri Nov 30 19:55:22 2001 From: ron_chen_123 at yahoo.com (Ron Chen) Date: Wed Nov 25 01:01:55 2009 Subject: GCC/Fortran 90/95 questions Message-ID: <20011201035522.85826.qmail@web14706.mail.yahoo.com> > 2) Does gcc support f90 or f95? If not is there any > GNU compiler that does, are any expected to be in > the future? There is a compiler called open64, which is SGI's compiler for IA64. They have a C front-end, which is based on gcc, and they have another for f90. (I don't know the details...) Recently, they have ported the f90 front-end and run-time to other compiler back-ends. Please read the note below for details. http://open64.sourceforge.net/ http://sourceforge.net/tracker/?group_id=34861&atid=413342 -Ron =========================================================== Porting open64 F90 front-end to Solaris This patch ports the open64 Fortran90 compiler front end to sparc_solaris platform. Specifically, it ports these three executable programs: "mfef90", "ir_tools", and "whirl2f". ANY OTHER COMPONENT OF OPEN64 IS NOT IN THE SCOPE OF THIS PATCH. Tested platforms include sparc_solaris, mips_irix and ia32_linux, using both GNU gcc and vendor compiler. Makefiles, some header files and some c/c++ source files were modified for porting. __________________________________________________ Do You Yahoo!? Buy the perfect holiday gifts at Yahoo! Shopping. http://shopping.yahoo.com