From SGaudet at turbotekcomputer.com  Thu Nov  1 05:55:17 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:50 2009
Subject: AMD testing
Message-ID: <3450CC8673CFD411A24700105A618BD6170EE5@911TURBO>

Hello Don,


> I had a problem with a Tyan 2460 that I got when they first 
> came out. It 
> seems that the board actually only supports 3 memory modules 
> even though it 
> has 4 slots. (??!!) I had the board in a cluster with several 
> other Tyan 2462 
> board machines all with 4 256MB mem modules. We were doing 
> testing on the 
> system and the machine with the 2460 was giving garbage results for a 
> calculation that used about 700MB of memory. All smaller jobs 
> had tested OK. 
> I couldn't find the problem (went through different memory, 
> kernels etc..) 
> then I started thinking ... why does Tyan say the board only 
> supports 3GB of 
> memory ... sure enough, when I took 1 module out of the 2460 
> machine, it ran 
> the big test job correctly. I tested this on a newer order of 
> the mother 
> boards and they seemed to be OK. The markings on the 
> motherboard still say 
> "A" but it looks a little different, the old one had dots 
> around it. (?) I 
> don't know if this is an isolated problem, just a bad board 
> ... ??? However, 
> I have seen other complaints about memory problems with the 
> 2460. Also, I 
> discovered that the sockets are "REALLY" fussy about how you 
> insert the 
> modules. If you don't get them just right memtest86 will 
> generate errors on 
> the modules even though they test good on other boards.  I 
> assume you had 4 
> 512MB modules in your machine I suggest you try leaving DIMM4 
> empty and try 
> testing the system again.   
> 
> I let Tyan know about the problem but haven't received a response.

I've seen this problem before with Tyan motherboards.  A year ago we had the
same issue with the S2510NG ThunderLE, dual NIC, 4 MB ATI  Graphics, No
SCSI.  The problem was hardware related, you couldn't install anything in
the fourth memory slot and expect to see it.

My guess is its another hardware related problem.  I'd try smaller density
ram first and see if the fourth slot is working at all.

Cheers,

Steve Gaudet 
   ..... 
  <(???)> 

  ========== 
  Turbotek Computer Corp. 
  8025 South Willow St. 
  Manchester, NH 03103 
  toll free:800-573-5393 
  tel:603-666-3062 ext. 21 
  fax:603-666-4519 
  e-mail:sgaudet@turbotekcomputer.com 
  web: http://www.turbotekcomputer.com

From lindahl at conservativecomputer.com  Thu Nov  1 07:45:56 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:50 2009
Subject: AMD testing
In-Reply-To: <0GM3004KF97Y0W@mta5.rcsntx.swbell.net>; from kinghorn@pqs-chem.com on Wed, Oct 31, 2001 at 03:06:19PM -0600
References: <0GM3004KF97Y0W@mta5.rcsntx.swbell.net>
Message-ID: <20011101104556.C10893@wumpus.foo>

On Wed, Oct 31, 2001 at 03:06:19PM -0600, Donald B. Kinghorn wrote:

> I had a problem with a Tyan 2460 that I got when they first came out. It 
> seems that the board actually only supports 3 memory modules even though it 
> has 4 slots. (??!!)

Capacitance problem. I had some Alpha LX boards that were like that:
256 MB DIMMs weren't on the compatibility list, not because they
didn't work at all, but because you couldn't fill all the slots with
them. Well, I've had machines running reliably for years that had a
mix of 256 and 128 MB DIMMs.

BTW, you are sure you're using registered memory instead of
unbuffered? One of the features of cheaper, unbuffered memory is that
you can't use that many of them. This probably isn't the case because
I don't think 3 or even 2 of those would necessarily work...

greg

From agrajag at scyld.com  Thu Nov  1 07:29:40 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:50 2009
Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster
In-Reply-To: <3BE062CB.C474552D@aviion.univ-lemans.fr>; from fcalvay@aviion.univ-lemans.fr on Wed, Oct 31, 2001 at 09:44:59PM +0100
References: <3BE062CB.C474552D@aviion.univ-lemans.fr>
Message-ID: <20011101102940.A25971@blueraja.scyld.com>

On Wed, 31 Oct 2001, Florent Calvayrac wrote:

> 
> to those with the same problem
> 
> Since  I couldn't find any free programs to address
> easily this issue I include below a dirty bash2 script 
> to spawn mpi executables on the "free" nodes of a Scyld cluster
> 
> Comments and feedback welcome

This script is just spawning jobs on the nodes that are using less cpu
time, right?   If you are using our latest release, -8, mpich
automatically uses beomap to map which nodes the jobs go to, and
beomap's default behavior is to automatically map the jobs to the nodes
that have the lowest cpu usage.

Is there something to this script that I'm missing that mpich with
beomap doesn't do for you?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011101/8cf81c8b/attachment.bin
From raysonlogin at yahoo.com  Thu Nov  1 08:18:55 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:50 2009
Subject: [PBS-USERS] PVFS
In-Reply-To: <002301c162ed$b2055670$990c2a80@batman>
Message-ID: <20011101161855.33405.qmail@web11407.mail.yahoo.com>

I think this discussion belongs to the beowulf mail-list.

Anyway, back to your question, please read the sample chapter:

http://www.oreilly.com/catalog/clusterlinux/chapter/ch09.html

Rayson

--- Brent Clements <bclem@rice.edu> wrote:
> This is waaaay off the subject here....but anyone using PVFS? And why
> would I want to use it in my linux cluster?
> 
> -Brent Clements
> 
> 


__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

From kinghorn at pqs-chem.com  Thu Nov  1 09:45:19 2001
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Wed Nov 25 01:01:50 2009
Subject: AMD testing
Message-ID: <0GM40000YUL0GY@mta5.rcsntx.swbell.net>

... the memory modules I'm using are Crucial Registered ECC PC2100 ... should 
be good ... and Tyan lists using 4 of these modules as a "tested" 
configuration.

I'm disappointed that I didn't get a responce from Tyan since this is a 
serious issue. I would much rather have a system fail outright  rather than 
just producing erroneous results for some problems.  I'll be glad to see some 
other vendors enter the dual athlon market.

I should note again that the newer 2460 boards I received seem to be OK.
However, I would urge anyone using these boards to do thourough testing.
-Don

From gabriel.weinstock at dnamerican.com  Thu Nov  1 13:09:16 2001
From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock)
Date: Wed Nov 25 01:01:50 2009
Subject: Linksys EG1064
Message-ID: <01110116091602.01763@patagonia.dnamerican.com>

Is the Linksys GigE EG1064 NIC well supported under Linux? I scoured the web 
for information, and was able to find that it has a driver for FreeBSD, but I 
didn't find it in the hardware compatibility lists on RedHat or SuSE's sites. 
I would just like to know before purchasing. Thanks in advance,
-gabriel

From hahn at physics.mcmaster.ca  Thu Nov  1 13:14:14 2001
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed Nov 25 01:01:50 2009
Subject: AMD testing
In-Reply-To: <20011101104556.C10893@wumpus.foo>
Message-ID: <Pine.LNX.4.10.10111011608460.2110-100000@coffee.psychology.mcmaster.ca>

> > I had a problem with a Tyan 2460 that I got when they first came out. It 
> > seems that the board actually only supports 3 memory modules even though it 
> > has 4 slots. (??!!)
> 
> Capacitance problem. I had some Alpha LX boards that were like that:

perhaps.  people often don't realize that a dimm can be single
or double-sided, though: double sided uses up two "banks" that 
the chipset supports.  so, for instance, 4 slots might well be 
reasonable for a chipset that supports 6 banks, since you might
be able to put 4 single-sided dimms in the slots and have them work.

(6-banks is fairly common among chipset I've looked at recently.
note that these 'banks' are different from the banks internal 
to a single chip...)

> 256 MB DIMMs weren't on the compatibility list, not because they

a 256M dimm that consisted of 8 256Mb parts would count as one-sided,
for instance, but one that had 16 128 Mb parts would be two-sided.

offhand, I'd guess that all reg/buf dimms count as one-sided, but 
I suppose two-sided might exist, too, and be 1 gate cheaper...

regards, mark hahn.


From zadok at phreaker.net  Thu Nov  1 14:48:45 2001
From: zadok at phreaker.net (Hereward Cooper)
Date: Wed Nov 25 01:01:50 2009
Subject: AMD testing (fwd)
In-Reply-To: <Pine.LNX.4.33.0111012029310.560-100000@caxton.startext.demon.co.uk>
References: <Pine.LNX.4.33.0111012029310.560-100000@caxton.startext.demon.co.uk>
Message-ID: <20011101224845.20ec5cbe.zadok@phreaker.net>

> > I had a problem with a Tyan 2460 that I got when they first
> > came out. It
> > seems that the board actually only supports 3 memory modules
> > even though it
> > has 4 slots. (??!!) I had the board in a cluster with several
> > other Tyan 2462
> > board machines all with 4 256MB mem modules. We were doing
> > testing on the
> > system and the machine with the 2460 was giving garbage results for a
> > calculation that used about 700MB of memory. All smaller jobs
> > had tested OK.
> > I couldn't find the problem (went through different memory,
> > kernels etc..)
> > then I started thinking ... why does Tyan say the board only
> > supports 3GB of
> > memory ... sure enough, when I took 1 module out of the 2460
> > machine, it ran
> > the big test job correctly. I tested this on a newer order of
> > the mother
> > boards and they seemed to be OK. The markings on the
> > motherboard still say
> > "A" but it looks a little different, the old one had dots
> > around it. (?) I
> > don't know if this is an isolated problem, just a bad board
> > ... ??? However,
> > I have seen other complaints about memory problems with the
> > 2460. Also, I
> > discovered that the sockets are "REALLY" fussy about how you
> > insert the
> > modules. If you don't get them just right memtest86 will
> > generate errors on
> > the modules even though they test good on other boards.  I
> > assume you had 4
> > 512MB modules in your machine I suggest you try leaving DIMM4
> > empty and try
> > testing the system again.
> >
> > I let Tyan know about the problem but haven't received a response.

The manual that comes with the board does have a table showing the possible
combinations of memory, but it does state it doesn't list them all. Buts it
still gives you an idea.

Thanks,

Hereward


From Florent.Calvayrac at univ-lemans.fr  Fri Nov  2 01:24:15 2001
From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac)
Date: Wed Nov 25 01:01:50 2009
Subject: a shell script to spawn mpi executables on the "free" nodes of a 
 Scyld cluster
References: <3BE062CB.C474552D@aviion.univ-lemans.fr> <20011101102940.A25971@blueraja.scyld.com>
Message-ID: <3BE2663F.2394621F@univ-lemans.fr>

S
> 
> This script is just spawning jobs on the nodes that are using less cpu
> time, right?  

yes... but you admit that until this release, the problem
was present and discussed here about two months ago.


 If you are using our latest release, -8, mpich
> automatically uses beomap to map which nodes the jobs go to, and
> beomap's default behavior is to automatically map the jobs to the nodes
> that have the lowest cpu usage.


I am glad to learn it, this had escaped my attention.

We are indeed using -7 release, since it takes
some time to come from linuxcentral to here on CD...
and since you are a commercial company and 
only release on the FTP site (again from what I know) a large
bunch of source packages : I estimated that  the whole compilation
and installation time was too high and decided to keep
with -7 until the -8 is available on LinuxCentral.


-- 
Florent Calvayrac                
Laboratoire de Physique de l'Etat Condense 
http://www.univ-lemans.fr/~fcalvay 
Universite du Maine-Faculte des Sciences   
72085 Le Mans Cedex 9

From rajkumar at csse.monash.edu.au  Fri Nov  2 01:38:21 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Wed Nov 25 01:01:50 2009
Subject: Info on: comp.distributed
Message-ID: <3BE2698D.FED3B8E9@csse.monash.edu.au>

Dear All,

FYI,

Discussions are currently underway for the creation of an unmoderated
Usenet newsgroup called comp.distributed to address grid and
peer-to-peer (even cluster) issues, or any other issues related 
to collectives of network connected distributed resources.  
Discussions are taking place on the news.groups newsgroup, and voting 
is expected to begin in mid-November. The current draft of the Request 
for Discussion (RFD), which includes a description of the charter, can be 
found on news.announce.newgroups or attached below.

Thanks
Raj

-------------------------------------------------------------------
REQUEST FOR DISCUSSION (RFD):
            unmoderated newsgroup comp.distributed

This is a formal Request For Discussion (RFD) for the creation of
world-wide unmoderated Usenet newsgroup comp.distributed. This is
not a Call for Votes (CFV); you cannot vote at this time. Procedural
details are below.

Newsgroup line:
comp.distributed        Distributed Resource Sharing and Exploitation.

CHANGES from previous RFD:

This is an updated version of the previously submitted RFD for
comp.p2p-grid. It addresses many comments and concerns raised during
discussion on the earlier RFD including the recommendation of a new
name, comp.distributed.


RATIONALE: comp.distributed

Networks in general, and the internet specifically, have been
evolving, from star topologies of thin clients or dumb terminals
connected to central servers, to a collection of highly connected
nodes, many having significant compute, storage, and peripherals,
along with human presence.  Likewise, internet tools and protocols
have evolved from being primarily a mechanism to "push" (via email)
or "pull" (via web-browser) untyped data, into supporting more
interactive, semantic, and bi-directional relationships.  These
changes have prompted different communities to (re-)explore the
potential of sharing and exploiting collections of heterogeneous,
geographically distributed resources such as computers, data, people,
and scientific instruments in a secure and consistent manner, usually
lacking any central control or authority.  These efforts are often
described with terms like "peer-to-peer" ("p2p") and "grids", and
can serve to virtualize enterprizes by blurring the significance of
physical location.

Different communities tend to focus on different varieties of
resources, different overall objectives and constraints, and different
degrees of permanence of the resource collectives.  For example,
"grid" communities will often consider large, semi-permanent (though
dynamically constituted) collections of world-class resources that can
be accessed much as utilities, to provide unprecedented capabilities
that enable, for example, large-scale problems in science, engineer-
ing, and commerce.  "p2p" communities, on the other hand, often seek
on-demand temporary relationships between everyday personal computers,
devices, and peripherals "at the edge of the network", that help to
solve every-day problems of sharing, collaboration, and computing in
more efficient, convenient, and economical ways.  Similar relation-
ships have been explored over time in areas related to human collabor-
ation, distributed data bases, distributed search, parallel and
distributed computing, web services, and hierarchical content delivery
networks.

In spite of these differences, all of these communities share a large
number of challenges as a direct result of attempting to effectively
and synergistically assemble and use these collectives of hetero-
geneous distributed resources.  These challenges include:

 * Lack of any central authority, leading to the potential unannounced
   availability or withdrawal of resources, requiring fault tolerant
   applications and complicating the discovery and scheduling of
   resources.
 * Heterogeneous resources, requiring methods to recognize and request
   unique functionality when needed, while hiding unexploitable
   resource differences behind consistent interfaces.
 * Heterogeneous performance in those resources, prompting the use of
   simulation and performance modeling to determine which resources to
   use when.
 * Heterogeneous requirements from both resource owners and end users
   in terms of their objectives, quality of services, and computa-
   tional economy.
 * Unpredictable and dynamic network topology and properties,
   requiring the ability to portably deal with differing latency and
   bandwidth constraints (e.g. hiding latency while minimizing
   overhead) and motivating quality of service (QoS) mechanisms.
 * A complex and unpredictable concurrent environment, requiring
   general approaches to program development that hide these features
   while leveraging existing tools, languages, and techniques wherever
   possible.
 * A memory hierarchy that can extend to the memory and disk throughout
   the collective, prompting a reconsideration of traditional data
   storage and caching approaches.
 * The potential presence of untrusted resources and/or actors,
   requiring decentralized approaches to privacy, authorization,
   authentication, anonymity, and the determination of levels of
   acceptable risk associated with different operational modes.
 * Achieving return on investment for both resource users and
   providers, requiring approaches for auditable accounting and re-
   imbursement as well as the consideration of cost/price as a resource
   selection parameter.
 * Impediments to connectivity, including firewalls and oversubscribed
   scarce network resources (such as dial-in modems, and IP addresses
   shared through network address translation/IP masquerading).
 * Cross-organizational IT involvement, requiring flexible and
   politically acceptable policies, procedures, and management tools.
 * Evaluating and proposing mechanisms and policies for the protection
   of intellectual property in an environment explicitly designed to
   facilitate instant sharing.
 * Understanding and exploiting the potential value of these resource
   collectives, including effective collaboration strategies,
   integration of mixed resource types into problem solving
   environments, novel application areas and solution approaches
   enabled by this environment, and the use of automated agents.

Already, international academic and commercial forums like:
 * Global Grid Forum: <http://www.gridforum.org>
 * Peer to Peer Computing WG: <http://www.p2pwg.org>
 * Universal Plug-n-Play Forum <http://www.upnp.org>
 * New Productivity Initiative <http://www.newproductivity.org>
have evolved to create standards and protocols for inter-operability
between heterogeneous systems providing virtual services.  Recently,
infrastructure projects like the NSF Distributed TeraScale Facility
have focused even more attention, and include involvement from several
companies.  Many computer and/or software vendors, large and small,
have recently announced specific projects or general priorities into
p2p and/or grids, including IBM, Intel, DSTC, Sun, and Microsoft.
Some details on these and other projects can be found at:
 * http://www.gridcomputing.com/
 * http://www.computer.org/dsonline/gc/index.htm
 * http://www.peertal.com/
 * http://www.nwfusion.com/
 * http://www.peerintelligence.com/
 * http://www.openp2p.com/

Although over 20 discussion mailing lists operated by individuals or
institutions exist, they are generally intended for discussion of
specific group priorities, and strongly segregate p2p and grid
communities, even when addressing similar issues.  Another concern is
that mailing lists are likely to generate large volume of email for
members; therefore, moderators will often discourage use of these
lists for general or controversial discussion, and many prospective
participants feel discouraged from subscribing, do not become members,
and do not join important topical discussions.  We believe that having
a newsgroup where people can participate in discussions of their own
choosing, when they want, without getting swamped with emails, will
help overcome these limitations and will encourage discussion and
dissemination without the need of explicit membership.  While some
existing newsgroups, like comp.parallel and comp.sys.super, touch on
some specialized aspects of this topic, and will continue to do so,
this new group will serve as a focal point for considering the inter-
relationships, interactions, and synergies when combining these
separate technologies.

Strategy for publicising the comp.distributed newsgroup:

The formation of the comp.distributed newsgroup will be publicised
through the following channels (but not limited to):

 * IEEE DS Online,
 * Global Grid Forum,
 * P2P WG,
 * Grid Infoware,
 * IEEE/ACM conferences:
 * CCGRID'xy: <http://www.ccgrid.org/>,
 * GRID'xy: <http://www.gridcomputing.org/>,
 * Yahoo Group on gridcomputing as part of GridInfoware.
 * IEEE Task Force on Cluster Computing (TFCC)
 * Newsgroups such as comp.parallel

END RATIONALE.


CHARTER: comp.distributed

Although the name "comp.distributed" has been chosen due to its
familiarity and convenience, the group is to be broader than just
those topics traditionally regarded as "distributed computing".
Specifically, topics are to include any unique issues relating to
the creation and exploitation of collectives of geographically
distributed and potentially heterogeneous resources such as computers,
data/information sources, peripherals, instruments, and humans.
Appropriate areas of discussion in this context would include (but
are not limited to):

 * discovering, scheduling/brokering, and accessing remote resources
 * exploitation of heterogeneous resources
 * resource management, scheduling, and computational economy
 * portable/adaptable communication substrates
 * quality of service approaches
 * portable program development tools, languages, techniques
 * data management tools and techniques
 * exploitation of distributed memory hierarchy
 * decentralized security
 * practical accounting, reimbursement, and business & revenue models
 * overcoming impediments to wide-area connectivity
 * cross-organizational policy issues and ways to address them
 * mechanisms and policies for intellectual property
 * programming tools, environments, and languages
 * applications, collaboration, and distributed agents
 * simulation and performance modelling
 * comparisons of grid and p2p, and issues unique to each
 * events, surveys, news and general announcements

It is expected that additional 3rd-level subgroups addressing some of
these topics or others may be created as dictated by the volume and
cohesiveness of resulting message traffic.

END CHARTER.


PROCEDURE:

This is a request for discussion, not a call for votes.  In this phase
of the process, any potential problems with the proposed newsgroups
should be raised and resolved.  The discussion period will continue for
a minimum of 21 days (starting from when the first RFD for this proposal
is posted to news.announce.newgroups), after which a Call For Votes
(CFV) will be posted by a neutral vote taker.  Please do not attempt to
vote until this happens.

All discussion of this proposal should be posted to news.groups.

This RFD attempts to comply fully with the Usenet newsgroup creation
guidelines outlined in "How to Create a New Usenet Newsgroup" and "How
to Format and Submit a New Group Proposal".  Please refer to these
documents (available in news.announce.newgroups) if you have any
questions about the process.

END PROCEDURE.


DISTRIBUTION: comp.distributed

This RFD has been posted to the following newsgroups:

news.announce.newgroups
news.groups
comp.arch
comp.parallel
comp.parallel.pvm
comp.parallel.mpi
comp.sys.super
comp.client-server

and to the following mailing lists:
  <gridcomputing@yahoogroups.com>
  <decentralization@yahoogroups.com>
  <PTPWG-DISCUSSION@PEACH.EASE.LSOFT.COM>

END DISTRIBUTION.

Proponent:  Rajkumar Buyya <rajkumar@csse.monash.edu.au>
Proponent:  David C. DiNucci <dave@elepar.com>

From agrajag at scyld.com  Fri Nov  2 05:07:30 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:50 2009
Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster
In-Reply-To: <3BE2663F.2394621F@univ-lemans.fr>; from Florent.Calvayrac@univ-lemans.fr on Fri, Nov 02, 2001 at 10:24:15AM +0100
References: <3BE062CB.C474552D@aviion.univ-lemans.fr> <20011101102940.A25971@blueraja.scyld.com> <3BE2663F.2394621F@univ-lemans.fr>
Message-ID: <20011102080730.A27515@blueraja.scyld.com>

On Fri, 02 Nov 2001, Florent Calvayrac wrote:

> S
> > 
> > This script is just spawning jobs on the nodes that are using less cpu
> > time, right?  
> 
> yes... but you admit that until this release, the problem
> was present and discussed here about two months ago.

Yes.  We saw the problem discussed on the list, which is one of the
reasons we made beomap, to solve the problem.  I appreciate you sending
out a fix for the problem, I just wanted to let you know that we already
have our own solution that works without running an extra script.  (In
case you're curious, beomap actually pulls the cpu load info from
libbeostat)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011102/99d408f5/attachment.bin
From Florent.Calvayrac at univ-lemans.fr  Fri Nov  2 08:49:11 2001
From: Florent.Calvayrac at univ-lemans.fr (Florent.Calvayrac)
Date: Wed Nov 25 01:01:50 2009
Subject: a shell script to spawn mpi executables on the "free" nodes of a Scyld cluster
In-Reply-To: <20011102080730.A27515@blueraja.scyld.com> from "Sean Dilda" at Nov 02, 2001 08:07:30 AM
Message-ID: <200111021649.RAA11656@pecbip1.univ-lemans.fr>


I had given a look to the sources of mpprun,
thinking of using indeed libbeosta,
but could not figure a way not to have one process
running on the master node.

Is this problem fixed in -8 relase of scyld ?


-- 
Florent Calvayrac                          | Tel : 02 43 83 26 26 
Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18
UMR-CNRS 6087         | http://www.univ-lemans.fr/~fcalvay 
Universite du Maine-Faculte des Sciences   |
72085 Le Mans Cedex 9     

From derek.richardson at pgs.com  Fri Nov  2 10:34:12 2001
From: derek.richardson at pgs.com (Derek Richardson)
Date: Wed Nov 25 01:01:50 2009
Subject: Linksys EG1064
Message-ID: <1004726053.1914.128.camel@idoru.hstn.tensor.pgs.com>

Gabriel,
Can't speak for the Linksys card, but I've had good experiences w/ Intel
gigabit ethernet.  I'd love to give you the model #, but it's from IBM,
so it's their model #'s.  It runs off the e1000 kernel module, though,
and handles our NFS load quite well.
Regards,
Derek R.
-- 
Junior Linux Geek
713-817-1197 (cell)
713-781-4000 x2267 (office)
"Linux users, fanatical.  No way...
HEY! Get that MCSE up on the altar,
Tux must be appeased!"


From SThomaso at phmining.com  Fri Nov  2 11:25:22 2001
From: SThomaso at phmining.com (Scott Thomason)
Date: Wed Nov 25 01:01:50 2009
Subject: Compile farm?
Message-ID: <sbe29ecb.029@HCNA01>

Greetings. I'm interested in setting up a shell account/batch process/compile farm system for our developers, and I'm wondering if Beowulf clusters are well suited to that task. We're not interested in writing parallel code using PVM or MPI, we just want to log into what appears to be one big server and have it dispatch the workload amongst the slave processors. Is Beowulf good at that?
---scott

p.s. Sorry if there are duplicates of this message; I used the wrong email address earlier.


From ron_chen_123 at yahoo.com  Fri Nov  2 11:44:54 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:50 2009
Subject: Compile farm?
In-Reply-To: <sbe29ecb.029@HCNA01>
Message-ID: <20011102194454.28360.qmail@web14707.mail.yahoo.com>

What you need is a batch system.

There are 2 free batch systems, SGE and PBS.

Both of them are opensource, but nevertheless, you can
get 7x24 support if you are willing to pay.

PBS: www.openpbs.com
     www.pbspro.com

SGE: www.sun.com/gridware
     gridengine.sunsource.net

Also, SGE has qmake, which can execute several
instances of make on mutliple machines for one single
make job.

Install note:
http://supportforum.sun.com/gridengine/appnote_install.html

 -Ron

--- Scott Thomason <SThomaso@phmining.com> wrote:
> Greetings. I'm interested in setting up a shell
> account/batch process/compile farm system for our
> developers, and I'm wondering if Beowulf clusters
> are well suited to that task. We're not interested
> in writing parallel code using PVM or MPI, we just
> want to log into what appears to be one big server
> and have it dispatch the workload amongst the slave
> processors. Is Beowulf good at that?
> ---scott
> 
> p.s. Sorry if there are duplicates of this message;
> I used the wrong email address earlier.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From bremner at unb.ca  Fri Nov  2 12:42:07 2001
From: bremner at unb.ca (David Bremner)
Date: Wed Nov 25 01:01:50 2009
Subject: Compile farm?
In-Reply-To: <20011102194454.28360.qmail@web14707.mail.yahoo.com>
References: <sbe29ecb.029@HCNA01>
	<20011102194454.28360.qmail@web14707.mail.yahoo.com>
Message-ID: <15331.1311.605294.128939@convex.cs.unb.ca>

Ron Chen writes:
 > What you need is a batch system.
 > 
 > There are 2 free batch systems, SGE and PBS.
 > 
[good info snipped]

It is not obvious that a batch system is the best answer to this 
particular problem.

Mosix (www.mosix.org) may be more appropriate for providing a single
system image.

From raysonlogin at yahoo.com  Fri Nov  2 13:03:58 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:50 2009
Subject: Compile farm?
In-Reply-To: <15331.1311.605294.128939@convex.cs.unb.ca>
Message-ID: <20011102210358.80694.qmail@web11401.mail.yahoo.com>

IMO, either Mosix or a batch system can do the job.

However, Mosix requires patching/recompiling the kernel. And the recent
changes in the VM makes running a non-standard kernel troublesome.

For the case of a batch system, the system admin only needs to install
the package, no recompiling of the kernel. And the user can even submit
jobs from their workstations. The jobs are queued until there are
resources for them to run.

Rayson


> It is not obvious that a batch system is the best answer to this 
> particular problem.
> 
> Mosix (www.mosix.org) may be more appropriate for providing a single
> system image.
> 
> From the point of view of efficient use of resources, a batch is 
> probably important.


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From zadok at phreaker.net  Sat Nov  3 13:26:05 2001
From: zadok at phreaker.net (Hereward Cooper)
Date: Wed Nov 25 01:01:50 2009
Subject: [ot] Re: AMD testing
Message-ID: <20011103212605.20c9b83d.zadok@phreaker.net>

Hi there,

Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
use registered memory? Will it blow up :-) ??

Thanks,

Hereward


-- What, never seen a signautre file before?


From xyzzy at speakeasy.org  Sat Nov  3 17:11:52 2001
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Wed Nov 25 01:01:50 2009
Subject: [ot] Re: AMD testing
In-Reply-To: <20011103212605.20c9b83d.zadok@phreaker.net>
Message-ID: <Pine.LNX.4.04.10111031707010.16880-100000@xyzzy.dsl.speakeasy.net>

On Sat, 3 Nov 2001, Hereward Cooper wrote:
> Hi there,
> 
> Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
> use registered memory? Will it blow up :-) ??

There was a review of this board when it came out at tom's hardware or
anadtech, I'm not sure which.  They tested registered vs non-registered
memory.  If you use more than two DIMM slots, you need registered.  Three
non-registered DIMMs won't work, and two non-registered plus one registered
won't work either.  It doesn't explode or catch fire (only if the heatsink
falls off..), but won't pass POST.  Registered ECC is only a couple dollars
more than non-registered ECC, so there really is no reason not to get it.


From math at velocet.ca  Sat Nov  3 18:10:31 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:50 2009
Subject: [ot] Re: AMD testing
In-Reply-To: <20011103212605.20c9b83d.zadok@phreaker.net>; from zadok@phreaker.net on Sat, Nov 03, 2001 at 09:26:05PM +0000
References: <20011103212605.20c9b83d.zadok@phreaker.net>
Message-ID: <20011103211031.D27471@velocet.ca>

On Sat, Nov 03, 2001 at 09:26:05PM +0000, Hereward Cooper's all...
> Hi there,
> 
> Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
> use registered memory? Will it blow up :-) ??

No it just doesnt work. I got a couple systems booted, but when I typed
'cat' I got a 'signal 9' error which I've never seen before. Later, I
booted again and my bash login shell wouldnt start - 'signal 11' error.
A third time logging in login(1) died with a signal 11 as well.

It just dont work. On www.crucial.com registered DDR ECC ram is only $4 or
$5 more per stick.

/kc


> 
> Thanks,
> 
> Hereward
> 
> 
> 
> 
> 
> -- What, never seen a signautre file before?
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From math at velocet.ca  Sat Nov  3 18:14:39 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:50 2009
Subject: Tyan Tiger MP (was Re: [ot] Re: AMD testing)
In-Reply-To: <Pine.LNX.4.04.10111031707010.16880-100000@xyzzy.dsl.speakeasy.net>; from xyzzy@speakeasy.org on Sat, Nov 03, 2001 at 05:11:52PM -0800
References: <20011103212605.20c9b83d.zadok@phreaker.net> <Pine.LNX.4.04.10111031707010.16880-100000@xyzzy.dsl.speakeasy.net>
Message-ID: <20011103211439.E27471@velocet.ca>

On Sat, Nov 03, 2001 at 05:11:52PM -0800, Trent Piepho's all...
> On Sat, 3 Nov 2001, Hereward Cooper wrote:
> > Hi there,
> > 
> > Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
> > use registered memory? Will it blow up :-) ??
> 
> There was a review of this board when it came out at tom's hardware or
> anadtech, I'm not sure which.  They tested registered vs non-registered
> memory.  If you use more than two DIMM slots, you need registered.  Three

I had one DIMM in, 256Mb, running freebsd.

> non-registered DIMMs won't work, and two non-registered plus one registered
> won't work either.  It doesn't explode or catch fire (only if the heatsink
> falls off..), but won't pass POST.  Registered ECC is only a couple dollars
> more than non-registered ECC, so there really is no reason not to get it.

It passed POST no problem, got through all the rc files, but then started
dying as soon as I logged in.

Your mileage may vary.

BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on
the +5V line. The 350W PSs I got do 32A on +5 and work great (and
seem to be of higher quality altogether too).

Watch out with the heatsinks you use on the Tyan Tiger, golden orbs
do NOT FIT with all the caps surrounding the CPUs. Use square or
WIDE (rectangular) heatsinks. A long one or circular one just wont fit.

BTW - anyone have experience running non-MP athlons on these boards? I
booted it with a couple and ran various jobs (dnetc, gromacs, g98,
bunch of compile jobs of said programs as well as a FreeBSD and linux
kernel among other things) and I've had no problems yet.

/kc


> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From mark at markrichman.com  Sun Nov  4 09:01:28 2001
From: mark at markrichman.com (Mark A. Richman)
Date: Wed Nov 25 01:01:50 2009
Subject: Web based process accounting
Message-ID: <000001c16552$5886d6c0$6801a8c0@yoda>

Are there any web front ends to PBS or process accounting tools?
 
Thanks,
Mark Richman
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011104/9644ee15/attachment.html
From zadok at phreaker.net  Sun Nov  4 09:31:48 2001
From: zadok at phreaker.net (Hereward Cooper)
Date: Wed Nov 25 01:01:50 2009
Subject: Tyan Tiger MP (was: Re: [ot] Re: AMD testing)
In-Reply-To: <200111041701.fA4H1E020173@blueraja.scyld.com>
References: <200111041701.fA4H1E020173@blueraja.scyld.com>
Message-ID: <20011104173148.714fcfea.zadok@phreaker.net>

once upon a time (actually it was more like Sun, 4 Nov 2001 12:01:14 -0500),
beowulf-request@beowulf.org said:


> BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on
> the +5V line. The 350W PSs I got do 32A on +5 and work great (and
> seem to be of higher quality altogether too).

thanks for the tip, shame I didn't know before as I went and bought a 300w one
yesteray that only does 25A on the +5v line :-( but atleast it only cost ?15.
 
> Watch out with the heatsinks you use on the Tyan Tiger, golden orbs
> do NOT FIT with all the caps surrounding the CPUs. Use square or
> WIDE (rectangular) heatsinks. A long one or circular one just wont fit.

The Akasa Icicle 765's I got with my mobo work great (or as far as I can
currently tell as the machine hasn't been running for more than 20 seconds, but
they fit tight and fully cover the chip + more).

> BTW - anyone have experience running non-MP athlons on these boards? I
> booted it with a couple and ran various jobs (dnetc, gromacs, g98,
> bunch of compile jobs of said programs as well as a FreeBSD and linux
> kernel among other things) and I've had no problems yet.

Sounds promising, did you get any noticable drop in performance?

Thanks,

Hereward

-- What, never seen a signautre file before?


From eric at fnordsystems.com  Sun Nov  4 10:19:35 2001
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Wed Nov 25 01:01:50 2009
Subject: Tyan Tiger MP (was: Re: [ot] Re: AMD testing)
In-Reply-To: <20011104173148.714fcfea.zadok@phreaker.net>
Message-ID: <FPENKJJFONDOPLMBGCGAOEMKDGAA.eric@fnordsystems.com>

$15 power supplies of any variety are invariably garbage...  there are
plenty of 300W power supplies in the $30-$35 range from larger taiwanese
manufacturers that put out 30A on the 5V wire.  But then, the price diff
between 300 and 350W is often $5, so go with the higher wattage.

An excellent Athlon/AthlonXP/AthlonMP cooler is the Dynatron DC1206BM-L,
it measures 60x60mm (horizontally) and uses a unique micro-fin design.
I've had very good results with it on the 1.53GHz Palomino core CPUs.
This HSF costs around $20 each.  URL:
http://www.dynatron-corp.com/proddetail.asp?cid=6&sku=DC1206BM-L

> > BTW - anyone have experience running non-MP athlons on these boards? I
> > booted it with a couple and ran various jobs (dnetc, gromacs, g98,
> > bunch of compile jobs of said programs as well as a FreeBSD and linux
> > kernel among other things) and I've had no problems yet.
>
> Sounds promising, did you get any noticable drop in performance?

I know it's possible to run dual Athlon-C (Thunderbird) 1.4GHz CPUs on the
Tyan S2460, but it's not adviseable unless your budget is really limited.
The Palominos (AthlonMP/AthlonXP) perform at least 15% better in many
FPU-intensive tasks.


Eric Kuhnke
Lead Engineer / Operations Manager
Fnord Datacenter Systems Inc.
eric@fnordsystems.com
www.fnordsystems.com
voice: +1-360-527-3301

> -----Original Message-----
> From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On
> Behalf Of Hereward Cooper
> Sent: Sunday, November 04, 2001 9:32 AM
> To: beowulf@beowulf.org
> Subject: Re: Tyan Tiger MP (was: Re: [ot] Re: AMD testing)
>
>
> once upon a time (actually it was more like Sun, 4 Nov 2001
> 12:01:14 -0500),
> beowulf-request@beowulf.org said:
>
>
> > BTW do NOT use 300W power supplies. I blew 2 trying. You need 30A on
> > the +5V line. The 350W PSs I got do 32A on +5 and work great (and
> > seem to be of higher quality altogether too).
>
> thanks for the tip, shame I didn't know before as I went and
> bought a 300w one
> yesteray that only does 25A on the +5v line :-( but atleast it
> only cost ?15.
>
> > Watch out with the heatsinks you use on the Tyan Tiger, golden orbs
> > do NOT FIT with all the caps surrounding the CPUs. Use square or
> > WIDE (rectangular) heatsinks. A long one or circular one just
> wont fit.
>
> The Akasa Icicle 765's I got with my mobo work great (or as far as I can
> currently tell as the machine hasn't been running for more than
> 20 seconds, but
> they fit tight and fully cover the chip + more).
>
>
> Thanks,
>
> Hereward
>
> -- What, never seen a signautre file before?
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From jakob at unthought.net  Sun Nov  4 12:29:12 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed Nov 25 01:01:50 2009
Subject: Compile farm?
In-Reply-To: <15331.1311.605294.128939@convex.cs.unb.ca>; from bremner@unb.ca on Fri, Nov 02, 2001 at 04:42:07PM -0400
References: <sbe29ecb.029@HCNA01> <20011102194454.28360.qmail@web14707.mail.yahoo.com> <15331.1311.605294.128939@convex.cs.unb.ca>
Message-ID: <20011104212912.W14001@unthought.net>

On Fri, Nov 02, 2001 at 04:42:07PM -0400, David Bremner wrote:
> Ron Chen writes:
>  > What you need is a batch system.
>  > 
>  > There are 2 free batch systems, SGE and PBS.
>  > 
> [good info snipped]
> 
> It is not obvious that a batch system is the best answer to this 
> particular problem.
> 
> Mosix (www.mosix.org) may be more appropriate for providing a single
> system image.

I tried this with Mosix.

Problem is - mosix migrates jobs after a while. Initially a compiler
takes up a few megabytes of memory, but "after a while" it has grown
to hundreds of megabytes. When mosix decides to migrate the compiler
it will spend a long time on the netowrk to move the large process
image.

There's some patch to make that integrates it with Mosix, but I didn't
try that out.

Instead, I implemented  http://unthought.net/antsd  which will distribute your
compilers efficiently to the proper nodes.   It's not very sophisticated, but
it does the job for me at least  :)

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

From jnellis at dslextreme.com  Sun Nov  4 13:15:50 2001
From: jnellis at dslextreme.com (Joe Nellis)
Date: Wed Nov 25 01:01:50 2009
Subject: Scyld distro- Help with examples
Message-ID: <001701c16575$e1535b40$73f2a540@dslextreme.com>

Greetings,

I am writing a tutorial for a recently constructed Scyld beowulf cluster (-7
basic ed.) and I have some questions on the location of include files.
Basically I am having users copy the /usr/mpi-beowulf/examples to their home
directory and then make/compile them there so they can play with them.  My
problem comes with the hello++.cc example.   The include file is mpi++.h
which further asks for other includes in the /usr/include/mpi-beowulf/
directory when they are actually located in the
/usr/include/mpi-beowulf/c++/ directory.  Were these files supposed to be
stuffed into this 'c++' subdirectory for some reason and is it safe to move
them up to the parent directory so the example can compile?

thanks,
Joe Nellis
jnellis@dslextreme.com
beowulf@cecs.csulb.edu


From jnellis at dslextreme.com  Sun Nov  4 13:45:21 2001
From: jnellis at dslextreme.com (Joe Nellis)
Date: Wed Nov 25 01:01:50 2009
Subject: Using NFS with Scyld (-7 ver.)
Message-ID: <002501c1657a$00c4d680$73f2a540@dslextreme.com>

Greetings (again),

We are having problems getting the nodes to see users home accounts.  Our
master node mounts an NFS for all /home files. We have changed and
uncommented the /etc/beowulf/fstab file so that MASTER = 192.168.10.251,
which is the first nic.  After rebooting the nodes we did a
>> bpsh -a df
and saw that the nodes are mounting the master at 192.168.10.1:/home (the
second nic).  Doing a
>> bpsh 4 ls home
lists all users but any attempt to get details or dig farther down
>> bpsh 4 ls -al home or bpsh 4 ls home/jnellis
gives a file or directory not found error.

So I am wondering two things, since I am not a networking guy.  Do we have
the MASTER= in the fstab pointed at the right IP address (I am guessing it
shouldn't point directly at the NFS)?  Secondly, is there something we are
missing that must allow requests for /home files on the nodes to pass
THROUGH the master. I ask this because the nodes report mounting /home
through nic#2 address and the node fstab is through nic#1 address.

thanks,
Joe Nellis
jnellis@dslextreme.com


From agrajag at scyld.com  Sun Nov  4 13:57:52 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:50 2009
Subject: Scyld distro- Help with examples
In-Reply-To: <001701c16575$e1535b40$73f2a540@dslextreme.com>; from jnellis@dslextreme.com on Sun, Nov 04, 2001 at 01:15:50PM -0800
References: <001701c16575$e1535b40$73f2a540@dslextreme.com>
Message-ID: <20011104165752.A26086@blueraja.scyld.com>

On Sun, 04 Nov 2001, Joe Nellis wrote:

> Greetings,
> 
> I am writing a tutorial for a recently constructed Scyld beowulf cluster (-7
> basic ed.) and I have some questions on the location of include files.
> Basically I am having users copy the /usr/mpi-beowulf/examples to their home
> directory and then make/compile them there so they can play with them.  My
> problem comes with the hello++.cc example.   The include file is mpi++.h
> which further asks for other includes in the /usr/include/mpi-beowulf/
> directory when they are actually located in the
> /usr/include/mpi-beowulf/c++/ directory.  Were these files supposed to be
> stuffed into this 'c++' subdirectory for some reason and is it safe to move
> them up to the parent directory so the example can compile?

The C++ bindings for MPI do not work in -7.  You will have to use -8 if
you want C++ to work with MPI.


Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011104/f439586c/attachment.bin
From agrajag at scyld.com  Sun Nov  4 14:09:56 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
In-Reply-To: <002501c1657a$00c4d680$73f2a540@dslextreme.com>; from jnellis@dslextreme.com on Sun, Nov 04, 2001 at 01:45:21PM -0800
References: <002501c1657a$00c4d680$73f2a540@dslextreme.com>
Message-ID: <20011104170956.A26403@blueraja.scyld.com>

On Sun, 04 Nov 2001, Joe Nellis wrote:

> shouldn't point directly at the NFS)?  Secondly, is there something we are
> missing that must allow requests for /home files on the nodes to pass
> THROUGH the master. I ask this because the nodes report mounting /home
> through nic#2 address and the node fstab is through nic#1 address.

As far as I know, you cannot get a linux box to mount an NFS filesystem,
then reexport it over NFS to another machine.  So, as far as I know,
what you're asking is impossible.

I might suggest giving people their own home directories on the master
and just teaching them how to scp over the files they need.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011104/a5e5e4cb/attachment.bin
From ron_chen_123 at yahoo.com  Sun Nov  4 15:34:20 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:51 2009
Subject: Web based process accounting
In-Reply-To: <000001c16552$5886d6c0$6801a8c0@yoda>
Message-ID: <20011104233420.32143.qmail@web14708.mail.yahoo.com>

There is a package called PBSWeb, which provides a Web
GUI for PBS:

http://www.cs.ualberta.ca/~pinchak/PBSWeb/

 -Ron

--- "Mark A. Richman" <mark@markrichman.com> wrote:
> Are there any web front ends to PBS or process
> accounting tools?
>  
> Thanks,
> Mark Richman
>  
>  
> 


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From ron_chen_123 at yahoo.com  Sun Nov  4 15:40:00 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:51 2009
Subject: Fwd: Re: PBS/Veridian (Re: [PBS-USERS] Re: DRM standard API)
Message-ID: <20011104234000.39120.qmail@web14704.mail.yahoo.com>

Many closed source companies claim that opensource
products do not have support.

Indeed, opensource tools do have very nice support. 

 -Ron

--- Gabriel Mateescu <gabriel.mateescu@nrc.ca> wrote:
> Date: Wed, 31 Oct 2001 11:58:42 -0500
>
> From: Gabriel Mateescu <gabriel.mateescu@nrc.ca>
> Indeed, Veridian-PBS stands out, due to 
> a very prompt and competent technical 
> support.
> 
> Gabriel
> 
> 
> "Wilbur R. Johnson" wrote:
> > 
> > I have to second this. Being one of the folks who
> has sent money to
> > Veridian, I am very pleased with the support I
> have received.
> >
>
__________________________________________________________________________
> To unsubscribe: email majordomo@openpbs.org with
> body "unsubscribe pbs-users"
> For message archives: visit
> http://openpbs.org/UserArea/pbs-users.html
>     -    -    -    -    -    -    -    -    -    -  
>  -    -    -    -
> Academic Site? Use PBS Pro free, see:
> http://www.pbspro.com/academia.html
> OpenPBS and the pbs-users mailing list is sponsored
> by Veridian.
>
__________________________________________________________________________
>
__________________________________________________________________________
> To unsubscribe: email majordomo@openpbs.org with
> body "unsubscribe pbs-users"
> For message archives: visit
> http://openpbs.org/UserArea/pbs-users.html
>     -    -    -    -    -    -    -    -    -    -  
>  -    -    -    -
> Academic Site? Use PBS Pro free, see:
> http://www.pbspro.com/academia.html
> OpenPBS and the pbs-users mailing list are sponsored
> by Veridian.
>
__________________________________________________________________________


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From jtracy at ist.ucf.edu  Mon Nov  5 07:52:07 2001
From: jtracy at ist.ucf.edu (Judd Tracy)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
In-Reply-To: <20011104170956.A26403@blueraja.scyld.com>
Message-ID: <Pine.LNX.4.33.0111051050130.5535-100000@figment.ist.ucf.edu>

On Sun, 4 Nov 2001, Sean Dilda wrote:

> On Sun, 04 Nov 2001, Joe Nellis wrote:
> 
> > shouldn't point directly at the NFS)?  Secondly, is there something we are
> > missing that must allow requests for /home files on the nodes to pass
> > THROUGH the master. I ask this because the nodes report mounting /home
> > through nic#2 address and the node fstab is through nic#1 address.
> 
> As far as I know, you cannot get a linux box to mount an NFS filesystem,
> then reexport it over NFS to another machine.  So, as far as I know,
> what you're asking is impossible.

I believe that you can, you need to enable sun compatibility for that.  I 
have not tested it, but I remember someone saying that you could.
 
> I might suggest giving people their own home directories on the master
> and just teaching them how to scp over the files they need.


-- 
Judd Tracy
Institute for Simulation and Training
jtracy@ist.ucf.edu


From alazur at plogic.com  Mon Nov  5 09:44:26 2001
From: alazur at plogic.com (Adam Lazur)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
In-Reply-To: <20011104170956.A26403@blueraja.scyld.com>
References: <002501c1657a$00c4d680$73f2a540@dslextreme.com> <20011104170956.A26403@blueraja.scyld.com>
Message-ID: <20011105124426.B12093@clustermonkey.org>

Sean Dilda (agrajag@scyld.com) said:
> As far as I know, you cannot get a linux box to mount an NFS filesystem,
> then reexport it over NFS to another machine.  So, as far as I know,
> what you're asking is impossible.

Exporting an nfs mount via nfs is possible if you use the user space
nfsd (as opposed to the now standard knfsd). The option for this is
somewhere in the manpages.

-- 
Adam Lazur <alazur@plogic.com>
Special Forces, Paralogic Inc.

From Daniel.Kidger at quadrics.com  Mon Nov  5 09:58:26 2001
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D2E1@stegosaurus.bristol.quadrics.com>

> As far as I know, you cannot get a linux box to mount an NFS filesystem,
> then reexport it over NFS to another machine.  So, as far as I know,
> what you're asking is impossible.

I have never seen re-exporting a directory working.

What you can do is routing with ipchains so all nodes on a cluster's private
ethernet can mount a filesystem on an external system.  

Also on the subject I found that auto-mounting /home on demand (see
/etc/auto.misc) was much more reliable than trying to keep the mounts
permanently up. 


Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger@quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

From gkogan at students.uiuc.edu  Mon Nov  5 10:19:06 2001
From: gkogan at students.uiuc.edu (german kogan)
Date: Wed Nov 25 01:01:51 2009
Subject: problems with Scyld
Message-ID: <Pine.GSO.4.31.0111051213520.697-100000@ux5.cso.uiuc.edu>


Hi.

I am having problems with booting up slave nodes. Every time I try to do
it I get an error in the state column in the BeoSetup. I looked in the log
file for that node and it said
" setup_libs: Copying libraries to node 2...
tar:lib/ld-2.1.3.so: Cannot write: No space left on device
tar: Error exit delayed from previous errors
Library copy to node failed. (rootfs=/rootfs)"

I cleaned up, deleted most of the partitions, on that node using the fdisk
utility from the Windows 98 start up disk. But still gives me the same
error. If somebody can help me out I would greatly appreciate it.

Thanks


From becker at scyld.com  Mon Nov  5 11:03:06 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
In-Reply-To: <010C86D15E4D1247B9A5DD312B7F5AA739D2E1@stegosaurus.bristol.quadrics.com>
Message-ID: <Pine.LNX.4.10.10111051358220.685-100000@vaio.greennet>

On Mon, 5 Nov 2001, Daniel Kidger wrote:
> > As far as I know, you cannot get a linux box to mount an NFS filesystem,
> > then reexport it over NFS to another machine.  So, as far as I know,
> > what you're asking is impossible.
> 
> I have never seen re-exporting a directory working.

It does work: I wrote the original user-level NFS server (unfsd) used by
Linux, and re-exporting was one of the primary advantages over the Sun
implementation.  Having a per-client user ID map was another.

> What you can do is routing with ipchains so all nodes on a cluster's private
> ethernet can mount a filesystem on an external system.  

That's a better approach for most clusters, however you can get better
caching when using re-export from the master.  the NFS consistency
problem on writes affects either approach.  We recommend only using NFS
for small read-only configuration files in /home.


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From becker at scyld.com  Mon Nov  5 11:05:25 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:51 2009
Subject: problems with Scyld
In-Reply-To: <Pine.GSO.4.31.0111051213520.697-100000@ux5.cso.uiuc.edu>
Message-ID: <Pine.LNX.4.10.10111051403450.685-100000@vaio.greennet>

On Mon, 5 Nov 2001, german kogan wrote:

> I am having problems with booting up slave nodes. Every time I try to do
> it I get an error in the state column in the BeoSetup. I looked in the log
> file for that node and it said
> " setup_libs: Copying libraries to node 2...
> tar:lib/ld-2.1.3.so: Cannot write: No space left on device
> tar: Error exit delayed from previous errors
> Library copy to node failed. (rootfs=/rootfs)"

How much memory do you have on the slave nodes?  If less than 64MB, you
will have to trim the library list.  Or better, buy 128MB or 256MB DIMMs
which are now the minimum that systems should economically have.


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From edwards at icantbelieveimdoingthis.com  Mon Nov  5 12:05:27 2001
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Wed Nov 25 01:01:51 2009
Subject: Memory on Scyld systems
Message-ID: <20011105130527.A32021@icantbelieveimdoingthis.com>

I have a question about memory on AMD-based clusters. I am now running
a homogeneous Scyld cluster with 768MB on each node. I have modified the 
config file with a mem= command and have had no problems. Now I am augmenting 
the cluster with new nodes that have 1.5 GB of memory on each node (single 
processor nodes). Is there a way to use a different config file for the new 
nodes? 
Also, I have heard that there have been problems with 1.5 GB memory for some systems. Is this a consistent problem?

Art Edwards

P. S. I'm running Scyld 27Bz-7

From jtracy at ist.ucf.edu  Mon Nov  5 11:58:01 2001
From: jtracy at ist.ucf.edu (Judd Tracy)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <200111052011.XAA04207@nocserv.free.net>
Message-ID: <Pine.LNX.4.33.0111051456170.6527-100000@figment.ist.ucf.edu>

My understanding is the only difference is that they are tested and 
guaranteed to work in MP configurations.  AMD has said that they will not 
replace processors that do not work in MP configs unless they are MP 
certified.

On Mon, 5 Nov 101, Mikhail Kuzminsky wrote:

>     Dear colleagues,
> 
> I think about buying of Tyan S2460 motherboards for Beowulf.
> According the data I have, Athlon XP (Palomino core) microprocessors
> can work successfully w/this mobos.
> 
> But there is also Athlon MP microprocessors w/same Palomino core 
> w/same OPGA package w/same voltages and w/same frequencies beginning
> from 1333 (1500+). They costs, as I understand, higher than corresponding
> MP models.
> 
> Sorry, what is the difference between MP and XP chips ? Both,
> if my source was correct, supports cache coherence.
> 
> Yours
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Moscow
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Judd Tracy
Institute for Simulation and Training
jtracy@ist.ucf.edu


From math at velocet.ca  Mon Nov  5 13:35:40 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <Pine.LNX.4.33.0111051456170.6527-100000@figment.ist.ucf.edu>; from jtracy@ist.ucf.edu on Mon, Nov 05, 2001 at 02:58:01PM -0500
References: <200111052011.XAA04207@nocserv.free.net> <Pine.LNX.4.33.0111051456170.6527-100000@figment.ist.ucf.edu>
Message-ID: <20011105163540.V27471@velocet.ca>

On Mon, Nov 05, 2001 at 02:58:01PM -0500, Judd Tracy's all...
> 
> My understanding is the only difference is that they are tested and 
> guaranteed to work in MP configurations.  AMD has said that they will not 
> replace processors that do not work in MP configs unless they are MP 
> certified.

Athlon contracts out to do installations and guarantees them to this
degree? Thats gotta be a pretty penny.

/kc

> 
> On Mon, 5 Nov 101, Mikhail Kuzminsky wrote:
> 
> >     Dear colleagues,
> > 
> > I think about buying of Tyan S2460 motherboards for Beowulf.
> > According the data I have, Athlon XP (Palomino core) microprocessors
> > can work successfully w/this mobos.
> > 
> > But there is also Athlon MP microprocessors w/same Palomino core 
> > w/same OPGA package w/same voltages and w/same frequencies beginning
> > from 1333 (1500+). They costs, as I understand, higher than corresponding
> > MP models.
> > 
> > Sorry, what is the difference between MP and XP chips ? Both,
> > if my source was correct, supports cache coherence.
> > 
> > Yours
> > Mikhail Kuzminsky
> > Zelinsky Institute of Organic Chemistry
> > Moscow
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> -- 
> Judd Tracy
> Institute for Simulation and Training
> jtracy@ist.ucf.edu
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From eric at fnordsystems.com  Mon Nov  5 14:41:38 2001
From: eric at fnordsystems.com (Eric Kuhnke)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <Pine.LNX.4.33.0111051456170.6527-100000@figment.ist.ucf.edu>
Message-ID: <FPENKJJFONDOPLMBGCGAOEPNDGAA.eric@fnordsystems.com>

Of course, if an AthlonXP used in a dual-processor board ever dies, you
can always say to AMD "Yes, we were using it in a
#NAME_OF_SINGLE_PROCESSOR_MOTHERBOARD, and it randomly died".

It's pretty rare for a CPU to fail by itself, 99.9% of the time it's the
result of a heatsink fan failing, or the heatsink somehow coming loose
from the socket.  the 00.1% is power surges, lightning strikes, and things
like that...


Eric Kuhnke
Lead Engineer / Operations Manager
Fnord Datacenter Systems Inc.
eric@fnordsystems.com
www.fnordsystems.com
voice: +1-360-527-3301

> -----Original Message-----
> From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On
> Behalf Of Judd Tracy
> Sent: Monday, November 05, 2001 11:58 AM
> To: Mikhail Kuzminsky
> Cc: beowulf@beowulf.org
> Subject: Re: Athlon MP vs Athlon XP
>
>
>
> My understanding is the only difference is that they are tested and
> guaranteed to work in MP configurations.  AMD has said that
> they will not
> replace processors that do not work in MP configs unless they are MP
> certified.
>
> On Mon, 5 Nov 101, Mikhail Kuzminsky wrote:
>
> >     Dear colleagues,
> >
> > I think about buying of Tyan S2460 motherboards for Beowulf.
> > According the data I have, Athlon XP (Palomino core) microprocessors
> > can work successfully w/this mobos.
> >
> > But there is also Athlon MP microprocessors w/same Palomino core
> > w/same OPGA package w/same voltages and w/same frequencies beginning
> > from 1333 (1500+). They costs, as I understand, higher than
> corresponding
> > MP models.
> >
> > Sorry, what is the difference between MP and XP chips ? Both,
> > if my source was correct, supports cache coherence.
> >
> > Yours
> > Mikhail Kuzminsky
> > Zelinsky Institute of Organic Chemistry
> > Moscow
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> --
> Judd Tracy
> Institute for Simulation and Training
> jtracy@ist.ucf.edu
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


From edwards at icantbelieveimdoingthis.com  Mon Nov  5 14:50:37 2001
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Wed Nov 25 01:01:51 2009
Subject: Problems with Scyld
Message-ID: <20011105155037.A32324@icantbelieveimdoingthis.com>

 am attempting to install 16 new nodes on an existing Scyld network 
(Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet
cards. When I attempt to use the standard Scyld tools, boot the slave node, 
drag the new MAC address to the list and click apply, nothing happens. The
slave node continues to issue RARP attempts. When I build one of the new 
nodes into a head node and attempt the same process, the MAC address does
not appear in the new addresses column. It seems as if the new ethernet cards
can send, but not receive. 

Any help would be apreciated.

Art Edwards


From becker at scyld.com  Mon Nov  5 15:53:31 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:51 2009
Subject: Problems with Scyld
In-Reply-To: <20011105155037.A32324@icantbelieveimdoingthis.com>
Message-ID: <Pine.LNX.4.10.10111051851380.685-100000@vaio.greennet>

On Mon, 5 Nov 2001, Art Edwards wrote:

>  am attempting to install 16 new nodes on an existing Scyld network 
> (Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet
> cards. When I attempt to use the standard Scyld tools, boot the slave node, 
> drag the new MAC address to the list and click apply, nothing happens. The
> slave node continues to issue RARP attempts. When I build one of the new 
> nodes into a head node and attempt the same process, the MAC address does
> not appear in the new addresses column. It seems as if the new ethernet cards
> can send, but not receive. 

The new "CX" cards require an updated driver.  The update is in 27*z-8,
or you can compile the driver update set from
  ftp://ftp.scyld.com/pub/network/netdrivers-3.0-1.src.rpm

You'll have to create new boot images and second stage images.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From kalyanakrishna at yahoo.com  Tue Nov  6 01:51:11 2001
From: kalyanakrishna at yahoo.com (Chadalavada Kalyana Krishna)
Date: Wed Nov 25 01:01:51 2009
Subject: ch_p4 Error -> System Hangs
Message-ID: <20011106095111.19316.qmail@web10507.mail.yahoo.com>

Hello all,

I am working on a 7 node Linux Cluster ( 6 compute
nodes , 1 FS).  I tried to run simple Hello World
Program. The C Program went through with out any
glitches. When I tried the same in FORTRAN, the
system from which the program was started, hung. I
could not trace out the source to any s/w problem or
installation, though I am not sure about it.

Repeated attempts to run the same resulted in hanging
of n09, n11, n13,n14, n15. I was not able to Ping to
the systems. But, I also do not understand why n10 did
not hang though I ran the program there too.

Ths display is :

Code: some numbres.

Alicee: Killed Interrupt handler
Kernel Panic: Interrupt Handler not syncing

One important point is that we have configured mpich
to use ssh instead of rsh for communication.

with reagrds,

Kalyan.Ch

=====
------------------------------------------------------------
Ch.Kalyana Krishna,
Parallel Processing Group,
National PARAM Super Computing Facility, Center for Development of Advanced Computing,
Pune University Campus,Pune - 411 007, India.
Ph: Off:+91-20-5694080 Res: +91-20-589255

__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From edwards at icantbelieveimdoingthis.com  Tue Nov  6 07:30:19 2001
From: edwards at icantbelieveimdoingthis.com (Art Edwards)
Date: Wed Nov 25 01:01:51 2009
Subject: Problems with Scyld
In-Reply-To: <Pine.LNX.4.10.10111051851380.685-100000@vaio.greennet>; from becker@scyld.com on Mon, Nov 05, 2001 at 06:53:31PM -0500
References: <20011105155037.A32324@icantbelieveimdoingthis.com> <Pine.LNX.4.10.10111051851380.685-100000@vaio.greennet>
Message-ID: <20011106083019.B1353@icantbelieveimdoingthis.com>

Thanks very much for the reply. 
I'm trying to blend AMD nodes with 1.5 G of CMOS memory with existing
AMD nodes with .75 G. There is a config file in /etc/beowulf that feeds either
the second or third boot phase that contains a mem= command. Is there a way
within Scyld to specify  different config file for different nodes?

Art Edwards
On Mon, Nov 05, 2001 at 06:53:31PM -0500, Donald Becker wrote:

From becker at scyld.com  Tue Nov  6 07:55:35 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:51 2009
Subject: ch_p4 Error -> System Hangs
In-Reply-To: <20011106095111.19316.qmail@web10507.mail.yahoo.com>
Message-ID: <Pine.LNX.4.10.10111061053460.10782-100000@vaio.greennet>

On Tue, 6 Nov 2001, Chadalavada Kalyana Krishna wrote:

> I am working on a 7 node Linux Cluster ( 6 compute
> nodes , 1 FS).

What system?  (Kernel version, etc.)

> system from which the program was started, hung. I
> could not trace out the source to any s/w problem or
> installation, though I am not sure about it.
> 
> Repeated attempts to run the same resulted in hanging
> of n09, n11, n13,n14, n15. I was not able to Ping to
> the systems. But, I also do not understand why n10 did
> not hang though I ran the program there too.
> 
> Ths display is :
> 
> Code: some numbres.
> 
> Alicee: Killed Interrupt handler

You have a kernel crash.  Given that it didn't occur on all systems, you
should look first for a hardware problem, especially memory corruption.

> One important point is that we have configured mpich
> to use ssh instead of rsh for communication.

This is likely not related to a kernel crash.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From agrajag at scyld.com  Tue Nov  6 08:16:36 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:51 2009
Subject: Memory on Scyld systems
In-Reply-To: <20011105130527.A32021@icantbelieveimdoingthis.com>; from edwards@icantbelieveimdoingthis.com on Mon, Nov 05, 2001 at 01:05:27PM -0700
References: <20011105130527.A32021@icantbelieveimdoingthis.com>
Message-ID: <20011106111636.A28908@blueraja.scyld.com>

On Mon, 05 Nov 2001, Art Edwards wrote:

> I have a question about memory on AMD-based clusters. I am now running
> a homogeneous Scyld cluster with 768MB on each node. I have modified the 
> config file with a mem= command and have had no problems. Now I am augmenting 
> the cluster with new nodes that have 1.5 GB of memory on each node (single 
> processor nodes). Is there a way to use a different config file for the new 
> nodes? 

The kernel commandline is stored in the bootfile /var/beowulf/boot.img
By default, this image is sent to all nodes, but if the file
/var/beowulf/boot.img.<nodenumber>  (ie boot.img.0) exists, it will use
that image for the given node.  You can create this image by modifying
/etc/beowulf/config.boot, then running
beoboot -2 -n -o /var/beowulf/boot.img.<nodenumber>

Once you've created one copy of the new bootfile, you should be able to
symlink or hardlink the other filenames to it so that
boot.img.<nodenumber> for all your new nodes points to it.

Just remember, whenever you change anything with your bootfiles in the
future, you're going to have to remake both files.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011106/88b9c747/attachment.bin
From snguyen at hotmail.com  Tue Nov  6 22:08:38 2001
From: snguyen at hotmail.com (Son Nguyen)
Date: Wed Nov 25 01:01:51 2009
Subject: problems with scyld - slave nodes
Message-ID: <LAW2-F35EUJlWZO8x9600002a0d@hotmail.com>

>Message: 4
>Date: Mon, 5 Nov 2001 12:19:06 -0600 (CST)
>From: german kogan <gkogan@students.uiuc.edu>
>To: <beowulf@beowulf.org>
>Subject: problems with Scyld
>
>
>
>Hi.
>
>I am having problems with booting up slave nodes. Every time I try to do
>it I get an error in the state column in the BeoSetup. I looked in the log
>file for that node and it said
>" setup_libs: Copying libraries to node 2...
>tar:lib/ld-2.1.3.so: Cannot write: No space left on device
>tar: Error exit delayed from previous errors
>Library copy to node failed. (rootfs=/rootfs)"
>
>I cleaned up, deleted most of the partitions, on that node using the fdisk
>utility from the Windows 98 start up disk. But still gives me the same
>error. If somebody can help me out I would greatly appreciate it.
>
>Thanks

German,

it is not ram.  it is partition allocation.  your / partition is not enough. 
  here is a suggestion

fat    50mb         beoboot
swap   256mb        swap
/      rest(1.4gig) rest

I have also found out that on certain testing of the filesystem, I can load 
100% of the / partition.  After a reboot, the slave node does not allow full 
active state due to lack of space issues.


Good luck

Sonny Nguyen
Senior Networking and Distributed Systems Engineer
The Mitre Corporation


>Message: 12
>Date: Mon, 5 Nov 2001 15:50:37 -0700
>To: beowulf@beowulf.org
>Subject: Problems with Scyld
>From: Art Edwards <edwards@icantbelieveimdoingthis.com>
>
>  am attempting to install 16 new nodes on an existing Scyld network
>(Scyld 27Bz-7) with little success. The new nodes have #com905CX ethernet
>cards. When I attempt to use the standard Scyld tools, boot the slave node,
>drag the new MAC address to the list and click apply, nothing happens. The
>slave node continues to issue RARP attempts. When I build one of the new
>nodes into a head node and attempt the same process, the MAC address does
>not appear in the new addresses column. It seems as if the new ethernet 
>cards
>can send, but not receive.
>
>Any help would be apreciated.
>
>Art Edwards
>
>
Art,

1) there is something wrong with the server.

2) take a look to see if you see a /var/beowulf to see if the file 
unknown_addresses exist.  if not touch the file and retry the client node.

Good Luck

Sonny Nguyen
Senior Networking and Distributed Systems Engineer
The Mitre Corporation

>
>
>
Here is a new question.  I have just received 27cz-8a.  Built the server.  
When trying to boot the slave nodes, the server sees it, accept it and 
distribute ip for the client without any intervention.  The slave node then 
failed to do the second boot phase...with the error...


neighbor table overflow....


this is a fresh install.  I have never seen this on 27bz-7


Sonny Nguyen


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


From agrajag at scyld.com  Tue Nov  6 14:24:55 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:51 2009
Subject: problems with scyld - slave nodes
In-Reply-To: <LAW2-F35EUJlWZO8x9600002a0d@hotmail.com>; from snguyen@hotmail.com on Tue, Nov 06, 2001 at 10:08:38PM +0000
References: <LAW2-F35EUJlWZO8x9600002a0d@hotmail.com>
Message-ID: <20011106142455.C19207@kotako.analogself.com>

On Tue, 06 Nov 2001, Son Nguyen wrote:

> >Message: 4
> >Date: Mon, 5 Nov 2001 12:19:06 -0600 (CST)
> >From: german kogan <gkogan@students.uiuc.edu>
> >To: <beowulf@beowulf.org>
> >Subject: problems with Scyld
> >
> >
> >
> >Hi.
> >
> >I am having problems with booting up slave nodes. Every time I try to do
> >it I get an error in the state column in the BeoSetup. I looked in the log
> >file for that node and it said
> >" setup_libs: Copying libraries to node 2...
> >tar:lib/ld-2.1.3.so: Cannot write: No space left on device
> >tar: Error exit delayed from previous errors
> >Library copy to node failed. (rootfs=/rootfs)"
> >
> >I cleaned up, deleted most of the partitions, on that node using the fdisk
> >utility from the Windows 98 start up disk. But still gives me the same
> >error. If somebody can help me out I would greatly appreciate it.
> >
> >Thanks
> 
> German,
> 
> it is not ram.  it is partition allocation.  your / partition is not enough. 
>   here is a suggestion
> 
> fat    50mb         beoboot
> swap   256mb        swap
> /      rest(1.4gig) rest
> 
> I have also found out that on certain testing of the filesystem, I can load 
> 100% of the / partition.  After a reboot, the slave node does not allow full 
> active state due to lack of space issues.


The problem is with the / partition.  However, on a fresh install of
Scyld Beowulf, the / partition is a ram disk, which means running out of
RAM may be the problem.  It won't try to put anything on the harddrive
of the slave node until you bring the nodes up with ramdisks and
partition them, then change the /etc/beowulf/fstab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011106/e68ccdaa/attachment.bin
From javier.iglesias at freesurf.ch  Wed Nov  7 07:11:31 2001
From: javier.iglesias at freesurf.ch (Javier Iglesias)
Date: Wed Nov 25 01:01:51 2009
Subject: ExtremeNetworks Summit and channel bonding with Scyld
Message-ID: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>

Hi all,

We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster 
for academic research in the field of genetic programming, and 
large neural networks.

We'd like to use an Extreme Networks Summit 48 ethernet switch 
-> http://www.extremenetworks.com/products/datasheets/summit24.asp
connecting (highly recommended here recently :) Netgear FA310TX NICs 
-> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 

Here come the questions :
* has anyone experienced channel bonding on that switch ?
* any Gigabit NIC recommandation for the master node ?
* is it possible/necessary to channel bond Gigabit interfaces ?

Thanks in advance for your help !!

--javier

--
Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus!
http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1


From Kian_Chang_Low at vdgc.com.sg  Wed Nov  7 08:01:48 2001
From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low@vdgc.com.sg)
Date: Wed Nov 25 01:01:51 2009
Subject: Promise SX6000 vs Adaptec 2400A
Message-ID: <OF501E9B15.AE74639F-ON48256AFD.0056C6DC@vdgc.com.sg>

Hi all,

With 3ware existing the IDE raid storage card market, I was looking for a
replacement for a cluster and came across the Promise Supertrak SX6000 and
Adaptec ATA RAID 2400A.

I have no experience with the above and hope someone might shed some light.

1) Which card has the better support for Linux? I had heard that Promise is
not very Linux-friendly and tend to lock the user to older kernel. Is that
true?

2) Does anyone has experience putting more than 1 Promise card on a system?
Is it possible?

3) Is there any other alternatives?

Thanks,
Kian Chang.


From cblack at eragen.com  Wed Nov  7 08:17:29 2001
From: cblack at eragen.com (Chris Black)
Date: Wed Nov 25 01:01:51 2009
Subject: ExtremeNetworks Summit and channel bonding with Scyld
In-Reply-To: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>; from javier.iglesias@freesurf.ch on Wed, Nov 07, 2001 at 04:11:31PM +0100
References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>
Message-ID: <20011107111729.B7496@getafix.EraGen.com>

On Wed, Nov 07, 2001 at 04:11:31PM +0100, Javier Iglesias wrote:
> Hi all,
> 
> We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster 
> for academic research in the field of genetic programming, and 
> large neural networks.
> 
> We'd like to use an Extreme Networks Summit 48 ethernet switch 
> -> http://www.extremenetworks.com/products/datasheets/summit24.asp
> connecting (highly recommended here recently :) Netgear FA310TX NICs 
> -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 
> 
> Here come the questions :
> * has anyone experienced channel bonding on that switch ?
> * any Gigabit NIC recommandation for the master node ?
> * is it possible/necessary to channel bond Gigabit interfaces ?
> 
> Thanks in advance for your help !!

I have no experience with that switch, but have a few comments...
For gigabit NICs for linux, we have had good experience with the 
NetGear GA620 cards (not the GA622s which are a different chipset). 
They are well supported by the acenic driver and function well for us.

As for channel bonding, I really don't think you'll need it for 
genetic programming and neural networks as those aren't traditionally 
very-high-bandwidth applications if I am thinking about them correctly.
(That is if by genetic programming you mean genetic algorithms and not 
bioinformatics).
Not to mention the added complexity and time needed to implemenet channel 
bonding, I just don't think it would be worth it in this case.
It seems to me that many cluster workloads work fine with just fast 
ethernet to the nodes.

Chris


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011107/9d2db5a0/attachment.bin
From rgb at phy.duke.edu  Wed Nov  7 08:26:39 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:51 2009
Subject: RDRAM vs SDRAM redux
Message-ID: <Pine.LNX.4.33.0111071102130.766-100000@ganesh.phy.duke.edu>

Dear List Humans,

Life continues to get more puzzling all the time.  We are working out
final configurations for a mixed purchase of P4's and Athlon XP's.  Or
so I thought when I started to review the hardware alternatives this
morning. I'm basically getting ready to update a quote from three months
ago but the world has of course changed substantially in the meantime.

The Athlon update was fairly easy.  It looks like the KT266A chipset is
probably the one of choice for a single CPU solution (which I'm inclined
to) and in the meantime 512 MB DDR PC2100 DIMMS are now cheaper than 256
DIMMS were in the first quote.  Also choosing the XP for a single CPU
choice is a no-brainer.

The P4's are much more difficult because there are now SDRAM chipsets.

Does anyone have words of wisdom (or benchmarks!) to offer for the
performance of P4's running e.g. lattice QCD or other numbers,
especially those illustrating differences between code that uses SSE
instructions? I already found

  http://qcdhome.fnal.gov/cluster_design/benchmarks.html

but it is a bit dated (being all of five months old:-) and doesn't
include KT266A and XP OR SDRAM-equipped P4's.

I'm especially interested on what the best choice would be for a P4
intended to do well on memory-intensive code, e.g. Intel 845 (SDRAM but
CPU up to 2 GHz) or 850 (RDRAM but only 1.8 GHz?) or SiS 645 (DDR up to
2 GHz) as there are getting to be a truly dazzling array of
alternatives.

An obvious question is whether or not our lattice QCD folks and/or
quark-gluon plasma folks really need to get the P4's to hedge their bets
at this point.  The benchmark results above at FNAL show the P4 holding
a small (~20%) lead over the Palomino out in the large lattice sizes
likely to be dominated by memory speed.  The stream results for the P4,
especially with SSE instructions, still are much better for the P4 than
(say) the Palomino, but the KT266A suppposedly delivers 20-30% better
DDR performance than the KT266 did (and maybe than the AMD 760 used on
the Tyan Thunder?).  There is also no clear indication on whether using
an SSE compiler with the XP makes a difference -- does the XP support
SSE1 and/or SSE2 instructions?

Sigh.  Any help on these questions would be greatly appreciated.  Also,
if the FNAL folks are listening and have some newer boxes handy, it
would be fabulous of you to update your benchmarks above.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From jlb17 at duke.edu  Wed Nov  7 08:46:22 2001
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Wed Nov 25 01:01:51 2009
Subject: Promise SX6000 vs Adaptec 2400A
In-Reply-To: <OF501E9B15.AE74639F-ON48256AFD.0056C6DC@vdgc.com.sg>
Message-ID: <Pine.LNX.4.33.0111071136540.24535-100000@chaos.egr.duke.edu>

On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote

> With 3ware existing the IDE raid storage card market, I was looking for a
> replacement for a cluster and came across the Promise Supertrak SX6000 and
> Adaptec ATA RAID 2400A.

FWIW, someone with a *lot* of interest in big storage systems recently 
posted to the linux-ide-arrays list that 3ware have reversed their 
decision and will be getting back into the IDE raid card business 
(including releasing the 7850 RSN).  A press release is supposed to be 
forthcoming.  The Escalade pages are already back up.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


From jtracy at ist.ucf.edu  Wed Nov  7 07:52:00 2001
From: jtracy at ist.ucf.edu (Judd Tracy)
Date: Wed Nov 25 01:01:51 2009
Subject: ExtremeNetworks Summit and channel bonding with Scyld
In-Reply-To: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>
Message-ID: <Pine.LNX.4.33.0111071049420.14054-100000@figment.ist.ucf.edu>

On Wed, 7 Nov 2001, Javier Iglesias wrote:

> Hi all,
> 
> We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster 
> for academic research in the field of genetic programming, and 
> large neural networks.
> 
> We'd like to use an Extreme Networks Summit 48 ethernet switch 
> -> http://www.extremenetworks.com/products/datasheets/summit24.asp
> connecting (highly recommended here recently :) Netgear FA310TX NICs 
> -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4 
> 
> Here come the questions :
> * has anyone experienced channel bonding on that switch ?

I am having a representative from extreme bring a switch by our lab to 
test out chanel bonding.

> * any Gigabit NIC recommandation for the master node ?
> * is it possible/necessary to channel bond Gigabit interfaces ?

You can, but you might not get much benefit.  Make sure that you are using 
64 bit cards because the 32 bit pci bus can't really handle two cards.

> Thanks in advance for your help !!
> 
> --javier
> 
> --
> Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus!
> http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Judd Tracy
Institute for Simulation and Training
jtracy@ist.ucf.edu


From Florent.Calvayrac at univ-lemans.fr  Wed Nov  7 10:32:49 2001
From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac)
Date: Wed Nov 25 01:01:51 2009
Subject: ExtremeNetworks Summit and channel bonding with Scyld
References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>
Message-ID: <3BE97E51.798825CF@univ-lemans.fr>

> We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster
>
>
> 
> Here come the questions :
> * has anyone experienced channel bonding on that switch ?

This is irrelevant to your question, and 
maybe you are already aware of the problem,
but using channel bonding with ( at least 1 $ / version 7) 
 Scyld requires some  modifications to 
beoboot and to the kernel, of course.

We managed to get a solution that works, in case
anyone is interested...


-- 
Florent Calvayrac                        
UMR-CNRS 6087         | http://www.univ-lemans.fr/~fcalvay 
Universite du Maine-Faculte des Sciences   |
72085 Le Mans Cedex 9

From brian at chpc.utah.edu  Wed Nov  7 11:40:55 2001
From: brian at chpc.utah.edu (Brian Haymore)
Date: Wed Nov 25 01:01:51 2009
Subject: Promise SX6000 vs Adaptec 2400A
References: <Pine.LNX.4.33.0111071136540.24535-100000@chaos.egr.duke.edu>
Message-ID: <3BE98E47.5060705@chpc.utah.edu>

FYI,
	3Ware will be announcing shorting that Escalade is not in fact going 
away.  So those that did like the 3ware product this is great news.

Joshua Baker-LePain wrote:

> On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote
> 
> 
>>With 3ware existing the IDE raid storage card market, I was looking for a
>>replacement for a cluster and came across the Promise Supertrak SX6000 and
>>Adaptec ATA RAID 2400A.
>>
> 
> FWIW, someone with a *lot* of interest in big storage systems recently 
> posted to the linux-ide-arrays list that 3ware have reversed their 
> decision and will be getting back into the IDE raid card business 
> (including releasing the 7850 RSN).  A press release is supposed to be 
> forthcoming.  The Escalade pages are already back up.
> 
> 


-- 
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112-0190

Email: brian@chpc.utah.edu - Phone: (801) 585-1755 - Fax: (801) 585-5366


From yoon at bh.kyungpook.ac.kr  Wed Nov  7 23:11:01 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Wed Nov 25 01:01:51 2009
Subject: HPL residual check failure
References: <ONENKCEDKMNHJLBMCLGCGEDPCAAA.kjyoun@netstech.com>
Message-ID: <004401c16824$8a17a000$5f72f2cb@LocalHost>

I found your e-mail today. 

I can't find your system information in your e-mail. 

but I guess if you use Myrinet instead of 10/100 LAN. then please check the Cable & Myrinet mpich version.

If you use 10/100 LAN, then I guess your failure for the matrix size 23,000 is related to RAM .

Please check the RAM size & Physical Problems of your Workstations RAM first.

There may be problems in the heat but I think in the normal temperature, Workstation must be endured without fail.

Have a NIce Day.
  

---------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon@bh.kyungpook.ac.kr
jhyoon@mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D     Help the people with your PC 
 
Imagination is more important than knowledge.  A. Einstein
"??????? ??? ???" ??? ??, " ??? ??? ??" ?? ??? ??(???? ???? ???? ??)
"????? '???? ????? ??'??? ? ? ???, ??? ??? ????? ??? ????."   ?? ???   
"??? ?? ??? ??? ??? ??? ??? ??"  ??? 2000.4.22
"???? ???? ?? ??? ??? ??? ????" ? ?? 2000.4.29
"???? ??? ??? ??? ??? ????" ? ?? 2000.4.24
http://www.kichun.co.kr   2001.1.6
http://www.c3tv.com    2001.1.10 

------------------------------------------------------------------------

----- Original Message ----- 
From: ??? <kjyoun@netstech.com>
To: <beowulf@beowulf.org>
Sent: Monday, September 03, 2001 9:22 PM
Subject: HPL residual check failure


> Hi
> When I was doing HPL benchmark test using big matrix(bigger than 20,000 ) with many linux server(more than 20), sometimes I got residual check error as attached. 
> When I got residual check error, I turned off my linux servers for several hours and then tried again. And usually it worked - I don't know the reason.
> Heat is suspicious. But, is it really heat problem?
> Is there anybody who have experienced similar problem or know the reason?
> please help me.
> 
> Thanks in advance! 
> 
> Keaton
> 
> 
> HPL result files------------------------------------------------------------
> 
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W11R2C4        21000   200     6     6             702.80          8.786e+00
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0272768 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0140749 ...... PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0026585 ...... PASSED
> ============================================================================
> T/V                N    NB     P     Q               Time             Gflops
> ----------------------------------------------------------------------------
> W11R2C4        23000   200     6     6             866.35          9.364e+00
> ----------------------------------------------------------------------------
> ||Ax-b||_oo / ( eps * ||A||_1  * N        ) =     3255.3898794 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =     7833.1904572 ...... FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =     1364.3123654 ...... FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =           0.000049
> ||A||_oo . . . . . . . . . . . . . . . . . . . =        5827.145943
> ||A||_1  . . . . . . . . . . . . . . . . . . . =        5836.795619
> ||x||_oo . . . . . . . . . . . . . . . . . . . =           2.390054
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

From Kian_Chang_Low at vdgc.com.sg  Thu Nov  8 00:04:44 2001
From: Kian_Chang_Low at vdgc.com.sg (Kian_Chang_Low@vdgc.com.sg)
Date: Wed Nov 25 01:01:51 2009
Subject: Promise SX6000 vs Adaptec 2400A
Message-ID: <OF090D3E95.61AC0F45-ON48256AFE.002C1ED2@vdgc.com.sg>

Hi,

Thanks for the info.

I was wondering whether anyone in the list has experience with either the
Promise or the Adaptec cards and would like to share their experience?
Especially with drivers for the newer kernel.

Thanks,
Kian Chang.


                    Joshua                                                                
                    Baker-LePain           To:     <Kian_Chang_Low@vdgc.com.sg>           
                    <jlb17@duke.edu        cc:     <beowulf@beowulf.org>                  
                    >                      Subject:     Re: Promise SX6000 vs Adaptec     
                    Sent by:               2400A                                          
                    beowulf-admin@b                                                       
                    eowulf.org                                                            
                                                                                          
                                                                                          
                    11/08/01 12:46                                                        
                    AM                                                                    
                                                                                          
                                                                                          
On Thu, 8 Nov 2001 at 12:01am, Kian_Chang_Low@vdgc.com.sg wrote

> With 3ware existing the IDE raid storage card market, I was looking for a
> replacement for a cluster and came across the Promise Supertrak SX6000
and
> Adaptec ATA RAID 2400A.

FWIW, someone with a *lot* of interest in big storage systems recently
posted to the linux-ide-arrays list that 3ware have reversed their
decision and will be getting back into the IDE raid card business
(including releasing the 7850 RSN).  A press release is supposed to be
forthcoming.  The Escalade pages are already back up.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From leunen.d at fsagx.ac.be  Thu Nov  8 03:56:05 2001
From: leunen.d at fsagx.ac.be (David Leunen)
Date: Wed Nov 25 01:01:51 2009
Subject: Scyld iso image
Message-ID: <3BEA72D5.FA062766@fsagx.ac.be>

Hello,

Does anyone of you know a ftp site where I can found the iso image of
the latest scyld? I really can't wait the CD from linux central (and it
is an older version).

I have a pretty fast connection and it shouldn't be long. I will very
much appreciate if you provide it for me. You can answer me to my
personal e-mail or on this mail-list.

Thank you.

David

From patrick at myri.com  Thu Nov  8 03:53:08 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed Nov 25 01:01:51 2009
Subject: HPL residual check failure
References: <ONENKCEDKMNHJLBMCLGCGEDPCAAA.kjyoun@netstech.com> <004401c16824$8a17a000$5f72f2cb@LocalHost>
Message-ID: <3BEA7224.C9C36C12@myri.com>

Yoon Jae Ho wrote:

> but I guess if you use Myrinet instead of 10/100 LAN. then please check the Cable & Myrinet mpich version.

FYI, bad Myrinet cables do not produced corrupted data, there is 
a hardware CRC check on the NIC. Corrupted packets are just dropped, 
so symptoms of bad cables are messages timing out or very slow. 
You can look at the number of bad CRCs (badcrc_cnt) with "
gm_counters" (if you are using GM).

In the context of Keaton's failure, bad memory is certainely the 
problem. Usually, if things works after cooling the unit, it's 
very likely to be overheating hardware.

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick@myri.com 
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------

From j.c.burton at gats-inc.com  Thu Nov  8 07:39:19 2001
From: j.c.burton at gats-inc.com (John Burton)
Date: Wed Nov 25 01:01:51 2009
Subject: Multiple Promise Ultra 100 TX2 controllers... 
Message-ID: <3BEAA727.3F425C33@gats-inc.com>

Greetings!

Over the past month, I've been trying to build a 500GB  ATA/100 RAID 5
array and have encountered multiple problems along the way.  My system
consists of:

   * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit
     nic (eepro100) onboard.
   * 2 1GHz PIII processors w/ 512MB memory.
   * 9GB Quantum Atlas Ultra 160 SCSI system disk.
   * 100GB Seagate AIT tape Autoloader.
   * Seagate DDS-3 4mm tape drive.
   * 6 x 100GB Western Digital ATA/100 disks
   * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area
   * RedHat 7.2

The latest problem I've been having is with multiple Promise Ultra 100
tx2 controllers - with 6 disks, I need 6 IDE channels which means 3
Ultra 100 controllers.  I had purchased one tx2 earlier this year (early
spring) and just this past week purchased 2 more.  I installed them and
connected them to the 6 disks.

When I booted the machine, I got the Promise Ultra BIOS screen detecting
the drives, and then it displays a list of 8 possible drives (D0 - D7).
D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do
not have any disks listed (this is expected since I'm only using 1
master drive per channel). What is not expected is that there are only 8
possible drives listed. With 3 controllers, there should be 12 possible
drives with 6 drives detected.

When Linux started booting, I noticed that all 3 controllers and 6 disks
were detected. So far so good.  When the kernel started checking for
partitions on the disks, it ran into trouble (last two disks giving DMA
errors). Below is the appropriate log entries showing what happened.
According to the logs it looks like there is a problem with either the
3rd controller or the last 2 disks.  I rearranged the order of the
controllers (i.e. swapped which cards were installed in which slots) and
left the order of the disks the same (first two disks attached to the
controller in the first PCI slot, etc).  And got the same results (last
two disks showing DMA errors). I then changed the order of the disks
relative to the PCI slots and still got the same results (last two disks
giving DMA errors). I then removed one controller at a time (leaving 2
installed at any one time) and connected various combinations of 4 disks
from the available 6. Everything worked fine, with no errors.  At this
point I'm kinda stuck with the conclusion that only 2 Promise Ultra100
TX2 cards will work in that system at one time.

Does anyone have any suggestions? thoughts? help?

Hopefully waiting,

John


SYSLOG Entries:

Nov  7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00
dev 20
Nov  7 13:49:48 oracle kernel: PDC20268: chipset revision 1
Nov  7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will
probe irqs later
Nov  7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaf8000
Nov  7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit DISABLED
Primary PCI Mode Secondary MASTER Mode.
Nov  7 13:49:48 oracle kernel:     ide2: BM-DMA at 0xdf90-0xdf97, BIOS
settings: hde:pio, hdf:pio
Nov  7 13:49:48 oracle kernel:     ide3: BM-DMA at 0xdf98-0xdf9f, BIOS
settings: hdg:pio, hdh:pio
Nov  7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00
dev 18
Nov  7 13:49:48 oracle kernel: PDC20268: chipset revision 1
Nov  7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will
probe irqs later
Nov  7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaec000
Nov  7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED
Primary MASTER Mode Secondary MASTER Mode.
Nov  7 13:49:48 oracle kernel:     ide4: BM-DMA at 0xdf60-0xdf67, BIOS
settings: hdi:pio, hdj:pio
Nov  7 13:49:48 oracle kernel:     ide5: BM-DMA at 0xdf68-0xdf6f, BIOS
settings: hdk:pio, hdl:pio
Nov  7 13:49:49 oracle kernel: PDC20268: IDE controller on PCI bus 00
dev 10
Nov  7 13:49:49 oracle kernel: PDC20268: chipset revision 1
Nov  7 13:49:49 oracle kernel: PDC20268: not 100%% native mode: will
probe irqs later
Nov  7 13:49:49 oracle kernel: PDC20268: ROM enabled at 0xfeae4000
Nov  7 13:49:49 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED
Primary MASTER Mode Secondary MASTER Mode.
Nov  7 13:49:49 oracle kernel:     ide6: BM-DMA at 0xdf30-0xdf37, BIOS
settings: hdm:pio, hdn:pio
Nov  7 13:49:49 oracle kernel:     ide7: BM-DMA at 0xdf38-0xdf3f, BIOS
settings: hdo:pio, hdp:pio
Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: IDE controller on PCI
bus 00 dev 79
Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: chipset revision 0
Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: not 100%% native mode:
will probe irqs later
Nov  7 13:49:49 oracle kernel:     ide0: BM-DMA at 0xffa0-0xffa7, BIOS
settings: hda:pio, hdb:pio
Nov  7 13:49:49 oracle kernel:     ide1: BM-DMA at 0xffa8-0xffaf, BIOS
settings: hdc:DMA, hdd:pio
Nov  7 13:49:49 oracle kernel: hdc: CD-ROM CDU311, ATAPI CD/DVD-ROM
drive
Nov  7 13:49:49 oracle kernel: hde: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: hdg: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: hdi: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: hdk: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: hdm: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: hdo: WDC WD1000BB-00CCB0, ATA DISK drive
Nov  7 13:49:49 oracle kernel: ide1 at 0x170-0x177,0x376 on irq 15
Nov  7 13:49:49 oracle kernel: ide2 at 0xdff0-0xdff7,0xdfe6 on irq 22
Nov  7 13:49:49 oracle kernel: ide3 at 0xdfa8-0xdfaf,0xdfe2 on irq 22
Nov  7 13:49:49 oracle kernel: ide4 at 0xdfa0-0xdfa7,0xdf8e on irq 20
Nov  7 13:49:49 oracle kernel: ide5 at 0xdf80-0xdf87,0xdf8a on irq 20
Nov  7 13:49:49 oracle kernel: ide6 at 0xdf58-0xdf5f,0xdf7e on irq 18
Nov  7 13:49:49 oracle kernel: ide7 at 0xdf50-0xdf57,0xdf4e on irq 18
Nov  7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: hde: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, (U)DMA
Nov  7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: hdg: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, (U)DMA
Nov  7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: hdi: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
Nov  7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:49 oracle kernel: hdk: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: hdm: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: hdo: 195371568 sectors (100030 MB)
w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
Nov  7 13:49:50 oracle kernel: ide-floppy driver 0.97.sv
Nov  7 13:49:50 oracle kernel: Partition check:
Nov  7 13:49:50 oracle kernel:  hde: [PTBL] [12161/255/63] hde1
Nov  7 13:49:50 oracle kernel:  hdg: [PTBL] [12161/255/63] hdg1
Nov  7 13:49:50 oracle kernel:  hdi: [PTBL] [12161/255/63] hdi1
Nov  7 13:49:50 oracle kernel:  hdk: [PTBL] [12161/255/63] hdk1
Nov  7 13:49:50 oracle kernel:  hdm:hdm: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Nov  7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov  7 13:49:50 oracle kernel: hdm: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov  7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 {
DriveStatusError BadCRC }
Nov  7 13:49:50 oracle kernel: hdm: timeout waiting for DMA
Nov  7 13:49:50 oracle kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel:  [PTBL] [12161/255/63] hdm1
Nov  7 13:49:50 oracle kernel:  hdo:hdo: timeout waiting for DMA
Nov  7 13:49:50 oracle kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
(mask 0xffffffff)
Nov  7 13:49:50 oracle kernel:  [PTBL] [12161/255/63] hdo1


From lindahl at conservativecomputer.com  Thu Nov  8 07:57:36 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
In-Reply-To: <Pine.LNX.4.10.10111051358220.685-100000@vaio.greennet>; from becker@scyld.com on Mon, Nov 05, 2001 at 02:03:06PM -0500
Message-ID: <20011108105736.B12344@wumpus.foo>

On Mon, Nov 05, 2001 at 02:03:06PM -0500, Donald Becker wrote:

> It does work: I wrote the original user-level NFS server (unfsd) used by
> Linux, and re-exporting was one of the primary advantages over the Sun
> implementation.

Not only does re-exporting work, it works well enough that the CPlant
people use it for their single system disk, which is shared by more
than 1,000 nodes. There's 1 node per rack which mounts the one disk
and re-exports it to the other nodes in the rack. The extra caching is
very important to them.

greg


From lindahl at conservativecomputer.com  Thu Nov  8 07:58:14 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: Compile farm?
In-Reply-To: <20011104212912.W14001@unthought.net>; from jakob@unthought.net on Sun, Nov 04, 2001 at 09:29:12PM +0100
Message-ID: <20011108105814.C12344@wumpus.foo>

On Sun, Nov 04, 2001 at 09:29:12PM +0100, Jakob ?stergaard wrote:

> Problem is - mosix migrates jobs after a while. Initially a compiler
> takes up a few megabytes of memory, but "after a while" it has grown
> to hundreds of megabytes. When mosix decides to migrate the compiler
> it will spend a long time on the netowrk to move the large process
> image.

I've never used Mosix. Does it have the ability to set policies like
"this binary should always be immediately migrated at exec" or "all
processes should be migrated at exec"? You'd think it would... and
using such policies would solve this particular problem.

greg


From lindahl at conservativecomputer.com  Thu Nov  8 07:59:30 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: Promise SX6000 vs Adaptec 2400A
In-Reply-To: <OF501E9B15.AE74639F-ON48256AFD.0056C6DC@vdgc.com.sg>; from Kian_Chang_Low@vdgc.com.sg on Thu, Nov 08, 2001 at 12:01:48AM +0800
Message-ID: <20011108105930.F12344@wumpus.foo>

On Thu, Nov 08, 2001 at 12:01:48AM +0800, Kian_Chang_Low@vdgc.com.sg wrote:

> With 3ware existing the IDE raid storage card market, I was looking for a
> replacement for a cluster and came across the Promise Supertrak SX6000 and
> Adaptec ATA RAID 2400A.

If I understand these cards correctly, both are I2O cards. So you can
use their proprietary drivers, or you can use Linux's I2O driver, and
in both cases, you can put multiple controllers on one system.

I haven't tried it personally, though.

greg


From raysonlogin at yahoo.com  Thu Nov  8 08:14:22 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
Message-ID: <20011108161422.43393.qmail@web11403.mail.yahoo.com>

I found something interesting from AMD's developer site:

...processor also features the advanced _MOESI_ cache coherency
protocol to ensure efficient cache integrity in a multiprocessing
environment.

MOESI provides better performance in MP configurations, due to the
added O (owned) state. In theory, this can provide better cache
performance.

Also, please read the section "Why No Dual-Processing Support with
Thunderbird?" in the following web page:

http://www.creativecow.net/articles/hawes_tyler/amd_mps/athlonmp_760mp_full.html

Rayson

References:

AMD Athlon XP Processor Model 6 Data Sheet TM:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24309.pdf

AMD Athlon MP Processor Model 6 Data Sheet Multiprocessor-Capable for
Workstation and Server Platforms TM:

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24685.pdf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From joelja at darkwing.uoregon.edu  Thu Nov  8 07:47:17 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed Nov 25 01:01:51 2009
Subject: Multiple Promise Ultra 100 TX2 controllers... 
In-Reply-To: <3BEAA727.3F425C33@gats-inc.com>
Message-ID: <Pine.LNX.4.33.0111080740230.20029-100000@twin.uoregon.edu>

build a kernel and make sure that in the ide/ata section that:

CONFIG_PDC202XX_BURST (the caption says somehting like "Special UDMA 
Feature")   is enabled... 

that works around that bug in the ultra100/ultra66...

there more info in 

pathtokernel/drivers/ide/pdc202xx.c

joelja

On Thu, 8 Nov 2001, John Burton wrote:

> Greetings!
> 
> Over the past month, I've been trying to build a 500GB  ATA/100 RAID 5
> array and have encountered multiple problems along the way.  My system
> consists of:
> 
>    * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit
>      nic (eepro100) onboard.
>    * 2 1GHz PIII processors w/ 512MB memory.
>    * 9GB Quantum Atlas Ultra 160 SCSI system disk.
>    * 100GB Seagate AIT tape Autoloader.
>    * Seagate DDS-3 4mm tape drive.
>    * 6 x 100GB Western Digital ATA/100 disks
>    * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area
>    * RedHat 7.2
> 
> The latest problem I've been having is with multiple Promise Ultra 100
> tx2 controllers - with 6 disks, I need 6 IDE channels which means 3
> Ultra 100 controllers.  I had purchased one tx2 earlier this year (early
> spring) and just this past week purchased 2 more.  I installed them and
> connected them to the 6 disks.
> 
> When I booted the machine, I got the Promise Ultra BIOS screen detecting
> the drives, and then it displays a list of 8 possible drives (D0 - D7).
> D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do
> not have any disks listed (this is expected since I'm only using 1
> master drive per channel). What is not expected is that there are only 8
> possible drives listed. With 3 controllers, there should be 12 possible
> drives with 6 drives detected.
> 
> When Linux started booting, I noticed that all 3 controllers and 6 disks
> were detected. So far so good.  When the kernel started checking for
> partitions on the disks, it ran into trouble (last two disks giving DMA
> errors). Below is the appropriate log entries showing what happened.
> According to the logs it looks like there is a problem with either the
> 3rd controller or the last 2 disks.  I rearranged the order of the
> controllers (i.e. swapped which cards were installed in which slots) and
> left the order of the disks the same (first two disks attached to the
> controller in the first PCI slot, etc).  And got the same results (last
> two disks showing DMA errors). I then changed the order of the disks
> relative to the PCI slots and still got the same results (last two disks
> giving DMA errors). I then removed one controller at a time (leaving 2
> installed at any one time) and connected various combinations of 4 disks
> from the available 6. Everything worked fine, with no errors.  At this
> point I'm kinda stuck with the conclusion that only 2 Promise Ultra100
> TX2 cards will work in that system at one time.
> 
> Does anyone have any suggestions? thoughts? help?
> 
> Hopefully waiting,
> 
> John
> 
> 
> SYSLOG Entries:
> 
> Nov  7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00
> dev 20
> Nov  7 13:49:48 oracle kernel: PDC20268: chipset revision 1
> Nov  7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will
> probe irqs later
> Nov  7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaf8000
> Nov  7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit DISABLED
> Primary PCI Mode Secondary MASTER Mode.
> Nov  7 13:49:48 oracle kernel:     ide2: BM-DMA at 0xdf90-0xdf97, BIOS
> settings: hde:pio, hdf:pio
> Nov  7 13:49:48 oracle kernel:     ide3: BM-DMA at 0xdf98-0xdf9f, BIOS
> settings: hdg:pio, hdh:pio
> Nov  7 13:49:48 oracle kernel: PDC20268: IDE controller on PCI bus 00
> dev 18
> Nov  7 13:49:48 oracle kernel: PDC20268: chipset revision 1
> Nov  7 13:49:48 oracle kernel: PDC20268: not 100%% native mode: will
> probe irqs later
> Nov  7 13:49:48 oracle kernel: PDC20268: ROM enabled at 0xfeaec000
> Nov  7 13:49:48 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED
> Primary MASTER Mode Secondary MASTER Mode.
> Nov  7 13:49:48 oracle kernel:     ide4: BM-DMA at 0xdf60-0xdf67, BIOS
> settings: hdi:pio, hdj:pio
> Nov  7 13:49:48 oracle kernel:     ide5: BM-DMA at 0xdf68-0xdf6f, BIOS
> settings: hdk:pio, hdl:pio
> Nov  7 13:49:49 oracle kernel: PDC20268: IDE controller on PCI bus 00
> dev 10
> Nov  7 13:49:49 oracle kernel: PDC20268: chipset revision 1
> Nov  7 13:49:49 oracle kernel: PDC20268: not 100%% native mode: will
> probe irqs later
> Nov  7 13:49:49 oracle kernel: PDC20268: ROM enabled at 0xfeae4000
> Nov  7 13:49:49 oracle kernel: PDC20268: (U)DMA Burst Bit ENABLED
> Primary MASTER Mode Secondary MASTER Mode.
> Nov  7 13:49:49 oracle kernel:     ide6: BM-DMA at 0xdf30-0xdf37, BIOS
> settings: hdm:pio, hdn:pio
> Nov  7 13:49:49 oracle kernel:     ide7: BM-DMA at 0xdf38-0xdf3f, BIOS
> settings: hdo:pio, hdp:pio
> Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: IDE controller on PCI
> bus 00 dev 79
> Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: chipset revision 0
> Nov  7 13:49:49 oracle kernel: ServerWorks OSB4: not 100%% native mode:
> will probe irqs later
> Nov  7 13:49:49 oracle kernel:     ide0: BM-DMA at 0xffa0-0xffa7, BIOS
> settings: hda:pio, hdb:pio
> Nov  7 13:49:49 oracle kernel:     ide1: BM-DMA at 0xffa8-0xffaf, BIOS
> settings: hdc:DMA, hdd:pio
> Nov  7 13:49:49 oracle kernel: hdc: CD-ROM CDU311, ATAPI CD/DVD-ROM
> drive
> Nov  7 13:49:49 oracle kernel: hde: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: hdg: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: hdi: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: hdk: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: hdm: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: hdo: WDC WD1000BB-00CCB0, ATA DISK drive
> Nov  7 13:49:49 oracle kernel: ide1 at 0x170-0x177,0x376 on irq 15
> Nov  7 13:49:49 oracle kernel: ide2 at 0xdff0-0xdff7,0xdfe6 on irq 22
> Nov  7 13:49:49 oracle kernel: ide3 at 0xdfa8-0xdfaf,0xdfe2 on irq 22
> Nov  7 13:49:49 oracle kernel: ide4 at 0xdfa0-0xdfa7,0xdf8e on irq 20
> Nov  7 13:49:49 oracle kernel: ide5 at 0xdf80-0xdf87,0xdf8a on irq 20
> Nov  7 13:49:49 oracle kernel: ide6 at 0xdf58-0xdf5f,0xdf7e on irq 18
> Nov  7 13:49:49 oracle kernel: ide7 at 0xdf50-0xdf57,0xdf4e on irq 18
> Nov  7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: blk: queue c0435808, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: hde: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, (U)DMA
> Nov  7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: blk: queue c0435b4c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: hdg: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, (U)DMA
> Nov  7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: blk: queue c0435e90, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: hdi: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
> Nov  7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: blk: queue c04361d4, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:49 oracle kernel: hdk: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
> Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: hdm: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
> Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: hdo: 195371568 sectors (100030 MB)
> w/2048KiB Cache, CHS=193821/16/63, UDMA(100)
> Nov  7 13:49:50 oracle kernel: ide-floppy driver 0.97.sv
> Nov  7 13:49:50 oracle kernel: Partition check:
> Nov  7 13:49:50 oracle kernel:  hde: [PTBL] [12161/255/63] hde1
> Nov  7 13:49:50 oracle kernel:  hdg: [PTBL] [12161/255/63] hdg1
> Nov  7 13:49:50 oracle kernel:  hdi: [PTBL] [12161/255/63] hdi1
> Nov  7 13:49:50 oracle kernel:  hdk: [PTBL] [12161/255/63] hdk1
> Nov  7 13:49:50 oracle kernel:  hdm:hdm: dma_intr: status=0x51 {
> DriveReady SeekComplete Error }
> Nov  7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 {
> DriveStatusError BadCRC }
> Nov  7 13:49:50 oracle kernel: hdm: dma_intr: status=0x51 { DriveReady
> SeekComplete Error }
> Nov  7 13:49:50 oracle kernel: hdm: dma_intr: error=0x84 {
> DriveStatusError BadCRC }
> Nov  7 13:49:50 oracle kernel: hdm: timeout waiting for DMA
> Nov  7 13:49:50 oracle kernel: ide_dmaproc: chipset supported
> ide_dma_timeout func only: 14
> Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: blk: queue c0436518, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel:  [PTBL] [12161/255/63] hdm1
> Nov  7 13:49:50 oracle kernel:  hdo:hdo: timeout waiting for DMA
> Nov  7 13:49:50 oracle kernel: ide_dmaproc: chipset supported
> ide_dma_timeout func only: 14
> Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel: blk: queue c043685c, I/O limit 4095Mb
> (mask 0xffffffff)
> Nov  7 13:49:50 oracle kernel:  [PTBL] [12161/255/63] hdo1
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli				       joelja@darkwing.uoregon.edu    
Academic User Services			     consult@gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From cblack at eragen.com  Thu Nov  8 08:23:58 2001
From: cblack at eragen.com (Chris Black)
Date: Wed Nov 25 01:01:51 2009
Subject: Multiple Promise Ultra 100 TX2 controllers...
In-Reply-To: <3BEAA727.3F425C33@gats-inc.com>; from j.c.burton@gats-inc.com on Thu, Nov 08, 2001 at 10:39:19AM -0500
References: <3BEAA727.3F425C33@gats-inc.com>
Message-ID: <20011108112358.A11047@getafix.EraGen.com>

So, to summarize: two controllers work fine, three fail.
Have you turned on CONFIG_PDC202XX_BURST in your kernel config?
It is in the IDE/ATA section of the kernel config. It shows up in 
menuconfig as "Special UDMA feature" and the help text says:
For PDC20246, PDC20262, PDC20265 and PDC20267 Ultra DMA chipsets.       x   
  x Designed originally for PDC20246/Ultra33 that has BIOS setup            x   
  x failures when using 3 or more cards.                                    x   
  x                                                                         x   
  x Unknown for PDC20265/PDC20267 Ultra DMA 100.                            x   
  x                                                                         x   
  x Please read the comments at the top of drivers/ide/pdc202xx.c           x   
  x                                                                         x   
  x If unsure, say N.     

It sounds like it might help.

Chris


On Thu, Nov 08, 2001 at 10:39:19AM -0500, John Burton wrote:
> Greetings!
> 
> Over the past month, I've been trying to build a 500GB  ATA/100 RAID 5
> array and have encountered multiple problems along the way.  My system
> consists of:
> 
>    * SuperMicro 370DL3 motherboard w/ Adaptec Ultra 160 SCSI and 100mbit
>      nic (eepro100) onboard.
>    * 2 1GHz PIII processors w/ 512MB memory.
>    * 9GB Quantum Atlas Ultra 160 SCSI system disk.
>    * 100GB Seagate AIT tape Autoloader.
>    * Seagate DDS-3 4mm tape drive.
>    * 6 x 100GB Western Digital ATA/100 disks
>    * 2 x 3Ware hotswap chassis - fits 3 ATA/100 1" disks in a 2 bay area
>    * RedHat 7.2
> 
> The latest problem I've been having is with multiple Promise Ultra 100
> tx2 controllers - with 6 disks, I need 6 IDE channels which means 3
> Ultra 100 controllers.  I had purchased one tx2 earlier this year (early
> spring) and just this past week purchased 2 more.  I installed them and
> connected them to the 6 disks.
> 
> When I booted the machine, I got the Promise Ultra BIOS screen detecting
> the drives, and then it displays a list of 8 possible drives (D0 - D7).
> D0, D2, D4, & D6 have disks listed next to them and D1, D3, D5, & D7 do
> not have any disks listed (this is expected since I'm only using 1
> master drive per channel). What is not expected is that there are only 8
> possible drives listed. With 3 controllers, there should be 12 possible
> drives with 6 drives detected.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011108/332f8596/attachment.bin
From math at velocet.ca  Thu Nov  8 09:40:29 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <20011108161422.43393.qmail@web11403.mail.yahoo.com>; from raysonlogin@yahoo.com on Thu, Nov 08, 2001 at 08:14:22AM -0800
References: <20011108161422.43393.qmail@web11403.mail.yahoo.com>
Message-ID: <20011108124029.T32202@velocet.ca>

On Thu, Nov 08, 2001 at 08:14:22AM -0800, Rayson Ho's all...
> I found something interesting from AMD's developer site:
> 
> ...processor also features the advanced _MOESI_ cache coherency
> protocol to ensure efficient cache integrity in a multiprocessing
> environment.
> 
> MOESI provides better performance in MP configurations, due to the
> added O (owned) state. In theory, this can provide better cache
> performance.

What packages support MOESI and 3DNow!Professional? When will they?

/kc
-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From raysonlogin at yahoo.com  Thu Nov  8 10:20:36 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <20011108124029.T32202@velocet.ca>
Message-ID: <20011108182036.64364.qmail@web11403.mail.yahoo.com>

MOESI is a cache protocol, you don't need new software/compiler
support, all you need is the hardware chipset.

3DNow! Professional is actually Intel's SSE, which Intel's compiler can
generate vectorized code to take advantage of.

Rayson


--- Velocet <math@velocet.ca> wrote:
> What packages support MOESI and 3DNow!Professional? When will they?
> 
> /kc
> -- 
> Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  * 
> Toronto, CANADA 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From math at velocet.ca  Thu Nov  8 11:28:40 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <20011108182036.64364.qmail@web11403.mail.yahoo.com>; from raysonlogin@yahoo.com on Thu, Nov 08, 2001 at 10:20:36AM -0800
References: <20011108124029.T32202@velocet.ca> <20011108182036.64364.qmail@web11403.mail.yahoo.com>
Message-ID: <20011108142839.X32202@velocet.ca>

On Thu, Nov 08, 2001 at 10:20:36AM -0800, Rayson Ho's all...
> MOESI is a cache protocol, you don't need new software/compiler
> support, all you need is the hardware chipset.

From lindahl at conservativecomputer.com  Thu Nov  8 11:48:52 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <20011108142839.X32202@velocet.ca>; from math@velocet.ca on Thu, Nov 08, 2001 at 02:28:40PM -0500
References: <20011108124029.T32202@velocet.ca> <20011108182036.64364.qmail@web11403.mail.yahoo.com> <20011108142839.X32202@velocet.ca>
Message-ID: <20011108144852.A12948@wumpus.foo>

On Thu, Nov 08, 2001 at 02:28:40PM -0500, Velocet wrote:

> From what I understood from the useful articles that were
> posted here, the cache protocol allows sharing data between the CPUs
> via the northbridge directly.

Right. What it comes down to is this: Getting data from L2 is always
fastest if its in your own L2. But if it isn't, some machines fetch
from main memory faster than they can fetch a dirty line from someone
else's L2. AMD's scheme has reasonably fast main memory fetches, plus
even more efficient fetches from a remote L2.

I believe the Sun E10k is one of the few processors where main memory
is closer than someone else's L2. That makes false sharing even worse
than usual.

However, from the beowulf standpoint, most of us are running 2
independent mpi proceses on dual cpu boxes, right?

g


From tlovie at pokey.mine.nu  Thu Nov  8 12:36:08 2001
From: tlovie at pokey.mine.nu (Thomas Lovie)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <20011108144852.A12948@wumpus.foo>
Message-ID: <000e01c16894$ff3dc010$1106a8c0@sneezy>

 
> On Thu, Nov 08, 2001 at 02:28:40PM -0500, Velocet wrote:
> 
> > From what I understood from the useful articles that were 
> posted here, 
> > the cache protocol allows sharing data between the CPUs via the 
> > northbridge directly.
> 
> Right. What it comes down to is this: Getting data from L2 is 
> always fastest if its in your own L2. But if it isn't, some 
> machines fetch from main memory faster than they can fetch a 
> dirty line from someone else's L2. AMD's scheme has 
> reasonably fast main memory fetches, plus even more efficient 
> fetches from a remote L2.
> 
> I believe the Sun E10k is one of the few processors where 
> main memory is closer than someone else's L2. That makes 
> false sharing even worse than usual.
> 
> However, from the beowulf standpoint, most of us are running 
> 2 independent mpi proceses on dual cpu boxes, right?

I have a innocent question.... Does the kernel have processor affinity
built in to it yet?  The situation may arise that one of the mpi
processes gets bumped from it's processor, by a system task, then it in
turn bumps the other mpi task from the other processor, and in effect,
it's info is cached in the other processor.  I see the advantage that
fetching from a remote L2 is better, but does anybody know the status of
assigning a processor affinity mask to processes?

Tom Lovie.


From lindahl at conservativecomputer.com  Thu Nov  8 12:55:30 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy>; from tlovie@pokey.mine.nu on Thu, Nov 08, 2001 at 03:36:08PM -0500
References: <20011108144852.A12948@wumpus.foo> <000e01c16894$ff3dc010$1106a8c0@sneezy>
Message-ID: <20011108155530.A13104@wumpus.foo>

On Thu, Nov 08, 2001 at 03:36:08PM -0500, Thomas Lovie wrote:

> I have a innocent question.... Does the kernel have processor affinity
> built in to it yet?

Yes, and has for ages. Use the source, Luke:

/usr/src/linux/kernel/sched.c, function goodness. This is 2.4:

#ifdef CONFIG_SMP
		/* Give a largish advantage to the same processor...   */
		/* (this is equivalent to penalizing other processors) */
		if (p->processor == this_cpu)
			weight += PROC_CHANGE_PENALTY;
#endif

g

From hahn at physics.mcmaster.ca  Thu Nov  8 13:10:17 2001
From: hahn at physics.mcmaster.ca (Mark Hahn)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy>
Message-ID: <Pine.LNX.4.10.10111081607540.31943-100000@coffee.psychology.mcmaster.ca>

> I have a innocent question.... Does the kernel have processor affinity
> built in to it yet?

yes, it has for years (even 2.2).  in the mainline kernel,
the affinity is just "be reluctant to move processes",
but there's a patch (pset) if you really think you can 
do better manually.


From raysonlogin at yahoo.com  Thu Nov  8 13:23:13 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:51 2009
Subject: Athlon MP vs Athlon XP
In-Reply-To: <000e01c16894$ff3dc010$1106a8c0@sneezy>
Message-ID: <20011108212313.26013.qmail@web11401.mail.yahoo.com>

--- Thomas Lovie <tlovie@pokey.mine.nu> wrote:
> I have a innocent question.... Does the kernel have processor
> affinity
> built in to it yet? 

For Linux kernel, yes.

For Solaris, I don't know. 

Didn't have time to sign the NDA in order to access Solaris source :-)

Rayson

> The situation may arise that one of the mpi
> processes gets bumped from it's processor, by a system task, then it
> in
> turn bumps the other mpi task from the other processor, and in
> effect,
> it's info is cached in the other processor.  I see the advantage that
> fetching from a remote L2 is better, but does anybody know the status
> of
> assigning a processor affinity mask to processes?
> 
> Tom Lovie.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From cfernandes at elo.com.br  Thu Nov  8 17:33:01 2001
From: cfernandes at elo.com.br (Claudio Fernandes)
Date: Wed Nov 25 01:01:51 2009
Subject: Performance of pararallel programs  
Message-ID: <01110823330104.01108@master>


hello,

	I would like to know about any tools to mesure performance of parallel 
programs over mpich  in a scyld beowulf cluster .  I'm looking for 
any trace library  that keeps a record of a program's MPI  calls


	Thank you.
	
	Claudio Fernandes
	UNIVERSIDADE FEDERAL DO RN (UFRN)
	BRAZIL
	

From jlong at arsc.edu  Thu Nov  8 17:58:15 2001
From: jlong at arsc.edu (James Long)
Date: Wed Nov 25 01:01:51 2009
Subject: Performance of pararallel programs
In-Reply-To: <01110823330104.01108@master>
References: <01110823330104.01108@master>
Message-ID: <p04330100b810e80de9af@[199.165.84.194]>

http://www.pallas.de/pages/products.htm

At 11:33 PM -0200 11/8/01, Claudio Fernandes wrote:
>hello,
>
>	I would like to know about any tools to mesure performance of parallel
>programs over mpich  in a scyld beowulf cluster .  I'm looking for
>any trace library  that keeps a record of a program's MPI  calls
>
>
>	Thank you.
>
>	Claudio Fernandes
>	UNIVERSIDADE FEDERAL DO RN (UFRN)
>	BRAZIL
>
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

-- 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
James Long
MPP Specialist
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks, AK 99775-6020
jlong@arsc.edu
(907) 474-5731 work
(907) 474-5494 fax
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

From patrick at myri.com  Thu Nov  8 15:13:51 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed Nov 25 01:01:51 2009
Subject: Performance of pararallel programs
References: <01110823330104.01108@master>
Message-ID: <3BEB11AF.C72B2EC7@myri.com>

Claudio Fernandes wrote:

>         I would like to know about any tools to mesure performance of parallel
> programs over mpich  in a scyld beowulf cluster .  I'm looking for
> any trace library  that keeps a record of a program's MPI  calls

Jumpshot, included in MPICH. It's free and it works. I have 
been able to process trace files as large as 2 GBs.

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick@myri.com 
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------

From ron_chen_123 at yahoo.com  Thu Nov  8 19:44:09 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:51 2009
Subject: Fwd: Grid Engine Training at SC2001
Message-ID: <20011109034409.32545.qmail@web14702.mail.yahoo.com>

Beowulf and PBS users/developers,

There will be a training session at SC2001. What is
included is an intro the API initiative, which
provides a standard for PBS and SGE. In the near
future, we can have SGE/PBS sharing components!

See the email below for details.

 -Ron

--- Conrad Geiger wrote:
> For those that are attending SC2001, there is a free
> Grid Engine (SGE) training session available.
> If you are interested in this open source Beowulf
> job
> management system and would like to attend, please
> email
> me and show up at the Denver location and time
> listed below:
> 
>    Class: SGE (Grid Engine) training
>    Date: Monday, November 12
>    Time: 1:00 p.m. - 4:00 p.m.
>    Classroom location:  Colorado Ballroom F
>              Marriott Hotel, 1701 California Street,
> Denver
>                (near Denver Convention Center)
> 
>    AGENDA
>    GRID ENGINE (SGE) TECHNICAL PRESENTATION:
>  
>          Sun Grid Engine (1 hour)
> 		 -- overview of concepts
> 		 -- installation options
> 		 -- architecture
>                  -- information flow
>                  -- scheduling
>                  -- complexes and resource
> management
>                  -- parallel and checkpointing
>  
>          Examples (30 minutes)
> 		 -- complexes
> 		 -- load sensor
>                  -- license management
>                  -- immediate vs. low priority jobs
>  
>          SGE/EE technology (15 minutes)
>                  -- tickets
>                  -- share tree, functional,
> deadline, override
>  
>          Grid Engine Integration with ClusterTools
> (20 minutes)
>  
> 	 Grid Engine Open Source Project and API initiative
>                           (20 minutes)
> 
> For registration please reply to: 
> 
>         Conrad.Geiger@Sun.COM
> 


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From ron_chen_123 at yahoo.com  Thu Nov  8 19:46:33 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:51 2009
Subject: One more SGE event @ SuperComputing 2001
Message-ID: <20011109034633.32861.qmail@web14702.mail.yahoo.com>

--John Tollefsrud <john.tollefsrud@eng.sun.com> wrote:
> 
> If you are attending SC2001 (www.sc2001.org)
> November 10 - 16, you may be
> interested in the following events (if you live in
> Denver and would like to
> come by the event, drop me a note and I'll get you
> in). Sorry for the
> marketing-like blurb, but these are technical
> presentations and I thought
> they may be of interest.
> 
> Thanks,
> 
> jt
> 
> j.t@sun.com
> 
> --------
> 
> Grid Engine open source BOF
> This Birds of a Feather will feature a live walk
> through demonstration of
> Grid Engine (for newbies) and some review of the
> Grid Engine Enterprise
> Edition features, and other discussion items of
> interest to attendees. This
> is part of the Conference Technical Program agenda
> at SC2001.
> 
> When: 	November 15, 2001
> 		8:30 - 10am
> 
> Where:	Denver Convention Center
> 
> 
> Grid Computing technical talk
> This is a Sun Microsystems sponsored event, designed
> to provide a good
> exposure to the technical thought leaders and key
> topics in Grid Computing.
> Moderated by IDC Research Vice President Chris
> Willard, the speakers
> include:
> 
> 	Mary Thomas, San Diego Supercomputing Center, on
> Grid Access
> 	Keith Gray, BP, on Cluster Grids
> 	Craig Stair, Raytheon, on Campus Grids
> 	Ian Foster, Globus, on Global Grids
> 	Andrew Grimshaw, Avaki, on Global Grids
> 	Ed Seidel, Cactus Project, on Grid Application
> Frameworks
> 	Wolfgang Gentzsch, Sun Microsystems, on Sun Grid
> Computing
> 
> When: 	November 14, 2001
> 		8:00am - 8:30am Continental Breakfast
> 		8:30am - 9:30am Presentations
> 		9:30am - 9:45am Q & A
> 
> Where: 	Denver Athletic Club
> 		1325 Glenarm Place
>             1 block from the Denver Convention
> Center
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From ds10025 at cam.ac.uk  Thu Nov  8 23:04:25 2001
From: ds10025 at cam.ac.uk (test)
Date: Wed Nov 25 01:01:51 2009
Subject: Using NFS with Scyld (-7 ver.)
References: <20011108105736.B12344@wumpus.foo>
Message-ID: <002801c168ec$c57f7640$0301a8c0@cam.ac.uk>

Good morning

Where can I dowload a free copy of scyld?

Dan
----- Original Message -----
From: "Greg Lindahl" <lindahl@conservativecomputer.com>
To: "Beowulf List" <beowulf@beowulf.org>
Sent: Thursday, November 08, 2001 3:57 PM
Subject: Re: Using NFS with Scyld (-7 ver.)


> On Mon, Nov 05, 2001 at 02:03:06PM -0500, Donald Becker wrote:
>
> > It does work: I wrote the original user-level NFS server (unfsd) used by
> > Linux, and re-exporting was one of the primary advantages over the Sun
> > implementation.
>
> Not only does re-exporting work, it works well enough that the CPlant
> people use it for their single system disk, which is shared by more
> than 1,000 nodes. There's 1 node per rack which mounts the one disk
> and re-exports it to the other nodes in the rack. The extra caching is
> very important to them.
>
> greg
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>


From lindahl at conservativecomputer.com  Fri Nov  9 01:33:12 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:51 2009
Subject: RDRAM vs SDRAM redux
In-Reply-To: <Pine.LNX.4.33.0111071102130.766-100000@ganesh.phy.duke.edu>; from rgb@phy.duke.edu on Wed, Nov 07, 2001 at 11:26:39AM -0500
References: <Pine.LNX.4.33.0111071102130.766-100000@ganesh.phy.duke.edu>
Message-ID: <20011109043312.A14638@wumpus.foo>

On Wed, Nov 07, 2001 at 11:26:39AM -0500, Robert G. Brown wrote:

> Life continues to get more puzzling all the time.  We are working out
> final configurations for a mixed purchase of P4's and Athlon XP's.  Or
> so I thought when I started to review the hardware alternatives this
> morning. I'm basically getting ready to update a quote from three months
> ago but the world has of course changed substantially in the meantime.

What you're lacking is a good understanding of your code, cpu
vs. memory bandwidth. One way to explore that is to play with your
BIOS settings to artifically lower your STREAM bandwidth. Run your
code and STREAM both ways, see what happens.

> The Athlon update was fairly easy.  It looks like the KT266A chipset is
> probably the one of choice for a single CPU solution

If you don't mind terrible PCI performance, yes.

> The P4's are much more difficult because there are now SDRAM chipsets.

Right. And if you knew your code's dependence on stream, you'd be all
set.

> There is also no clear indication on whether using
> an SSE compiler with the XP makes a difference -- does the XP support
> SSE1 and/or SSE2 instructions?

If I recall correctly, the XP has SSE1, and the MP has SSE2, or at
least more than the XP has.

greg

From xyzzy at speakeasy.org  Fri Nov  9 02:15:24 2001
From: xyzzy at speakeasy.org (Trent Piepho)
Date: Wed Nov 25 01:01:52 2009
Subject: RDRAM vs SDRAM redux
In-Reply-To: <20011109043312.A14638@wumpus.foo>
Message-ID: <Pine.LNX.4.04.10111090212530.31078-100000@xyzzy.dsl.speakeasy.net>

On Fri, 9 Nov 2001, Greg Lindahl wrote:
> If I recall correctly, the XP has SSE1, and the MP has SSE2, or at
> least more than the XP has.

I'm pretty sure that neither the MP nor XP has SSE2 support.  The athlon
thunderbird has MMX but not SSE1 support, then the MP and now the XP added
SSE1.


From gropp at mcs.anl.gov  Fri Nov  9 05:59:33 2001
From: gropp at mcs.anl.gov (William Gropp)
Date: Wed Nov 25 01:01:52 2009
Subject: Performance of pararallel programs  
In-Reply-To: <01110823330104.01108@master>
Message-ID: <5.1.0.14.2.20011109074742.04817b40@localhost>

At 11:33 PM 11/8/2001 -0200, Claudio Fernandes wrote:


>hello,
>
>         I would like to know about any tools to mesure performance of 
> parallel
>programs over mpich  in a scyld beowulf cluster .  I'm looking for
>any trace library  that keeps a record of a program's MPI  calls

MPICH comes with several such libraries in the mpe directory.  If you are 
using the compilation scripts that come with MPICH, you can simply relink 
with -mpilog.  The Jumpshot program provides a graphical display for the 
data.  See http://www.mcs.anl.gov/perfvis/ for more information.

Bill


From rlatham at plogic.com  Fri Nov  9 07:20:14 2001
From: rlatham at plogic.com (Rob Latham)
Date: Wed Nov 25 01:01:52 2009
Subject: Scyld iso image
In-Reply-To: <3BEA72D5.FA062766@fsagx.ac.be>; from leunen.d@fsagx.ac.be on Thu, Nov 08, 2001 at 12:56:05PM +0100
References: <3BEA72D5.FA062766@fsagx.ac.be>
Message-ID: <20011109102014.T27969@otto.plogic.internal>

On Thu, Nov 08, 2001 at 12:56:05PM +0100, David Leunen wrote:
> Hello,
> 
> Does anyone of you know a ftp site where I can found the iso image of
> the latest scyld? I really can't wait the CD from linux central (and it
> is an older version).

gee, that's funny...i bought a couple scyld cd's from linux central
not 3 weeks ago.  took all of 3 days for them to come, and if i needed
them "yesterday", i could have paid for the fast shipping.

it's the "label side up" edition, 27BZ-8 ( january's linuxworld
release was 27BZ-7).  

search the MARC archives
( http://marc.theaimsgroup.com/?l=beowulf&r=1&w=2 ) if you want to see
the "free scyld iso download" discussion.

==rob

-- 
[ Rob Latham <rlatham@plogic.com>         Developer, Admin, Alchemist ]
[ Paralogic Inc. - www.plogic.com                                     ]
[                                                                     ]
[ EAE8 DE90 85BB 526F 3181                   1FCF 51C4 B6CB 08CC 0897 ]

From raysonlogin at yahoo.com  Fri Nov  9 08:48:47 2001
From: raysonlogin at yahoo.com (Rayson Ho)
Date: Wed Nov 25 01:01:52 2009
Subject: Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and management
Message-ID: <20011109164847.49991.qmail@web11402.mail.yahoo.com>

FYI,

Rayson

--- "Brian J. Watson" <Brian.J.Watson@compaq.com> wrote:
> In an SSI cluster, it should only be necessary to install software 
> on a single node. Most other nodes can be thin clients, using 
> Etherboot or PXE to load their kernel and ramdisk from the 
> CLMS master. A potential CLMS master node needs to have its kernel
> and ramdisk stored locally on a SCSI or IDE disk, in case it's
> the first node booted in the cluster. Even a potential CLMS master, 
> however, can initially get its kernel and ramdisk via Etherboot/PXE 
> and install them onto its hard disk with minimal sysadmin
> involvement.
> 
> Etherboot is an open-source software package for creating ROM images 
> that allow a computer to boot off the network using DHCP or BOOTP. 
> For those who cannot or will not flash their ROM with one of these
> images, Etherboot includes a special boot block for loading the image
> from a floppy or hard drive. Etherboot appears to support about
> a hundred different NIC models. Unfortunately, it only supports
> the x86 platform right now.
> 
> For more information, visit the Etherboot website:
>         http://etherboot.sourceforge.net/
> 
> PXE (Preboot Execution Environment) is an Intel specification for
> doing pretty much the same thing. An advantage is that PXE images
> come pre-loaded on certain NICs, but I suspect most PXE images are
> closed source.
> 
> To read Intel's PXE spec:
>         ftp://download.intel.com/ial/wfm/pxespec.pdf
> 
> To support this new dependent node booting model, changes to initial 
> node installation would include:
>   - Making sure dhcpd and tftpd are installed as part of the base 
>     Linux distribution.
>   - Installing mknbi (part of Etherboot) on the shared root for 
>     building a tagged image of the kernel and ramdisk.
>   - Adding an /etc/ssitab file for specifying the MAC address, 
>     IP address, node number, and local boot flag for each node
>     allowed to join the cluster. For each node with the local boot
>     flag set, a device for the boot partition must also be specified.
>     The local boot flag should only be set for potential CLMS master 
>     nodes on the x86 platform. For platforms not supported by 
>     Etherboot/PXE, such as Alpha, _all_ nodes should have the local 
>     boot flag set.
>   - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab.
>   - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf
>     from the data in /etc/ssitab. To support non-SSI uses of DHCP,
>     it copies anything it finds in /etc/dhcpd.proto before appending 
>     the generated lines.
>   - Installing a new lilo.ssi command that does the following:
>       * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and
> lilo 
>         to sync the default kernel and ramdisk out to all potential 
>         nodes that are up with the local boot flag set
>       * runs mknbi to generate a tagged image of the default kernel 
>         and ramdisk in /tftpboot/, so that dependent nodes can 
>         download it while booting
> 
> In addition, changes will have to be made to the ramdisk, which means
> changes to the mkinitrd.ssi script:
>   - Copy /etc/ssitab into the ramdisk.
>   - Enhance /linuxrc to match a local MAC address to an entry in 
>     /etc/ssitab to determine the local IP address and node number.
>   - If the local boot flag is set, then /linuxrc compares the default
>     kernel and ramdisk on the shared root to those on the local disk.
> 
>     If they differ, it runs lilo.ssi with a special flag to just sync
>     the local disk.
>   - The hack in VI.3 of the installation instructions will go away. 
>     Dave Zafman and I cooked up a scheme for /linuxrc to read 
>     /proc/partitions and make all the devices it finds there.
>     That removes the need for the sysadmin to figure out the local 
>     device names of the two GFS partitions.
>   - As well as building the ramdisk, mkinitrd.ssi also runs 
>     mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab.
> 
> Adding new nodes -- this is the beautiful part:
>   - Make sure there are enough available journals for the new nodes 
>     on the GFS shared root. Note that the Cluster Filesystem (CFS) 
>     that Dave is porting doesn't have this requirement, which makes 
>     it better suited for large clusters.
>   - Edit /etc/ssitab to add records for each new node. The MAC 
>     address can be determined by booting the new node with an 
>     Etherboot floppy or ROM image. Although the DHCP server will 
>     not respond to this unknown MAC address just yet, the node will 
>     display on its console the MAC address of the card it discovered.
>   - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf.
>   - Run lilo.ssi to distribute the new ramdisk to all nodes that are
>     up with the local boot flag set, and to rebuild the tagged image
>     in /tftpboot/.
>   - If a new node does not have the local boot flag set, just boot it
>     with the appropriate Etherboot/PXE ROM image or floppy. Like
> magic,
>     it'll join the cluster.
>   - If the local boot flag is set, and the platform is x86, boot it 
>     with the ROM image or floppy. While running /linuxrc, it'll sync 
>     the local disk if the boot partition has already been created.
>   - If the boot partition has not been created, /linuxrc will proceed
>     with joining the cluster. Once it has joined, run fdisk and mkfs
>     to set up the boot partition. Then reboot the node one more time 
>     with the ROM image or floppy, so it can sync the local disk the 
>     next time it joins.
>   - On a platform that does not support Etherboot/PXE, the PITA
> factor
>     is a bit higher for adding a new node (which must have the
>     local boot flag set). To avoid needless installation of the base 
>     OS, try booting off a distribution CD into rescue mode. Use fdisk
>     and mkfs to set up the boot partition. Mount it. Either use a 
>     floppy or set up networking to copy the default kernel and
> ramdisk 
>     from the cluster to the boot partition. Also, copy the
> appropriate 
>     stanza for your bootloader (e.g., aboot), and run it to install 
>     the boot block. Now it's ready to join the cluster. Finally, 
>     consider adding support for your platform to Etherboot or an
>     equivalent software package.
> 
> Some weaknesses in this proposal are support for non-x86 platforms,
> to which I've given some thought, and support for User Mode Linux,
> to which I've given very little thought. There are probably other
> weaknesses, but overall I think this improves the installation and 
> management of OpenSSI on the x86 platform.
> 
> Suggestions are definitely welcome, especially since I haven't 
> started the implementation, yet. ;)
> 
> -- 
> Brian Watson                | "Now I don't know, but I been told it's
> Linux Kernel Developer      |  hard to run with the weight of gold,
> Open SSI Clustering Project |  Other hand I heard it said, it's
> Compaq Computer Corp        |  just as hard with the weight of lead."
> Los Angeles, CA             |     -Robert Hunter, 1970
> 
> mailto:Brian.J.Watson@compaq.com
> http://opensource.compaq.com/
> 
> _______________________________________________
> ssic-linux-devel mailing list
> ssic-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

From wsb at paralleldata.com  Fri Nov  9 16:49:05 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:52 2009
Subject: Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and 
 management
References: <20011109164847.49991.qmail@web11402.mail.yahoo.com>
Message-ID: <3BEC7981.1BF493A5@paralleldata.com>

Rayson Ho wrote:
> 
> FYI,
> 
> Rayson
> 
> --- "Brian J. Watson" <Brian.J.Watson@compaq.com> wrote:
> >   - Adding an /etc/ssitab file for specifying the MAC address,
> >     IP address, node number, and local boot flag for each node
> >     allowed to join the cluster. For each node with the local boot
> >     flag set, a device for the boot partition must also be specified.
> >     The local boot flag should only be set for potential CLMS master
> >     nodes on the x86 platform. For platforms not supported by
> >     Etherboot/PXE, such as Alpha, _all_ nodes should have the local
> >     boot flag set.

Not sure what this guy is thinking but Alphas boot just fine with
bootp on SRM supported network cards. That's what Etherboot does for 
x86 boxes. I doubt Alpha does PXE though. Pretty much all non-x86 UNIX 
boxes I've used can netboot using bootp.

Linuxcentral sells x86 bootp capable network cards for those that are
interested in that sort of thing. The cards use etherboot. Also, 
most new mobo's with built-in Enet support PXE from what I've seen.

Wes

From gkogan at students.uiuc.edu  Sat Nov 10 22:31:27 2001
From: gkogan at students.uiuc.edu (german kogan)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <20011106142455.C19207@kotako.analogself.com>
Message-ID: <Pine.GSO.4.31.0111110028001.26308-100000@ux5.cso.uiuc.edu>

I brought some of my slave nodes up. But in the Memory section for some of
them it says 181/251MB (72%). Does this mean that 181 MB of memory are
being used for something or that 181MB are free?

Thanks


From edwards at icantbelieveimdoingthis.com  Sat Nov 10 22:25:47 2001
From: edwards at icantbelieveimdoingthis.com (Arthur H. Edwards,1,505-853-6042,505-256-0834)
Date: Wed Nov 25 01:01:52 2009
Subject: Scyld iso image
References: <3BEA72D5.FA062766@fsagx.ac.be>
Message-ID: <3BEE19EB.5020504@icantbelieveimdoingthis.com>

David Leunen wrote:

>Hello,
>
>Does anyone of you know a ftp site where I can found the iso image of
>the latest scyld? I really can't wait the CD from linux central (and it
>is an older version).
>
>I have a pretty fast connection and it shouldn't be long. I will very
>much appreciate if you provide it for me. You can answer me to my
>personal e-mail or on this mail-list.
>
>Thank you.
>
>David
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>.
>
I just got the cd from Linux central for 9.95 total. It took two days 
and says it is z-8.

Art Edwards


From agrajag at scyld.com  Sun Nov 11 20:35:52 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <Pine.GSO.4.31.0111110028001.26308-100000@ux5.cso.uiuc.edu>; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 12:31:27AM -0600
References: <20011106142455.C19207@kotako.analogself.com> <Pine.GSO.4.31.0111110028001.26308-100000@ux5.cso.uiuc.edu>
Message-ID: <20011111233552.A6888@blueraja.scyld.com>

On Sun, 11 Nov 2001, german kogan wrote:

> 
> I brought some of my slave nodes up. But in the Memory section for some of
> them it says 181/251MB (72%). Does this mean that 181 MB of memory are
> being used for something or that 181MB are free?

It means that 181M out of 251M are used, and that's approximately 72% of
the RAM.  When looking at this number, its important to remember that
the 181M is the RAM being used by processes on the system, as well as
any memory the kernel is using for buffers and cache (such as it uses
with filesystems to speed up repeated accesses).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011111/18f84b46/attachment.bin
From gkogan at students.uiuc.edu  Sun Nov 11 20:48:41 2001
From: gkogan at students.uiuc.edu (german kogan)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <20011111233552.A6888@blueraja.scyld.com>
Message-ID: <Pine.GSO.4.31.0111112244320.15564-100000@ux8.cso.uiuc.edu>

On Sun, 11 Nov 2001, Sean Dilda wrote:

> On Sun, 11 Nov 2001, german kogan wrote:
>
> >
> > I brought some of my slave nodes up. But in the Memory section for some of
> > them it says 181/251MB (72%). Does this mean that 181 MB of memory are
> > being used for something or that 181MB are free?
>
> It means that 181M out of 251M are used, and that's approximately 72% of
> the RAM.  When looking at this number, its important to remember that
> the 181M is the RAM being used by processes on the system, as well as
> any memory the kernel is using for buffers and cache (such as it uses
> with filesystems to speed up repeated accesses).
>

Thanks.
But it seems that too much RAM is being ussed up. All I have done was boot
up the slave nodes, and have not run anything on them. Or is this normal?

Also, another question is about mpi. I have ran a simple test code on
the cluster, and some processes seem to run on the master node. What do I
have to do to prevent this from happening? So that the processes only run on
the slave nodes.


Thanks


From agrajag at scyld.com  Sun Nov 11 21:06:15 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <Pine.GSO.4.31.0111112244320.15564-100000@ux8.cso.uiuc.edu>; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 10:48:41PM -0600
References: <20011111233552.A6888@blueraja.scyld.com> <Pine.GSO.4.31.0111112244320.15564-100000@ux8.cso.uiuc.edu>
Message-ID: <20011112000615.B6888@blueraja.scyld.com>

On Sun, 11 Nov 2001, german kogan wrote:

> > It means that 181M out of 251M are used, and that's approximately 72% of
> > the RAM.  When looking at this number, its important to remember that
> > the 181M is the RAM being used by processes on the system, as well as
> > any memory the kernel is using for buffers and cache (such as it uses
> > with filesystems to speed up repeated accesses).
> >
> 
> Thanks.
> But it seems that too much RAM is being ussed up. All I have done was boot
> up the slave nodes, and have not run anything on them. Or is this normal?

See what I wrote before.  That number includes memory the kernel might
be using for buffers and cache.  You might also want to try doing 'bpsh
<node> free' to see a breakdown of how the memory on the slave node is
being used. 


> 
> Also, another question is about mpi. I have ran a simple test code on
> the cluster, and some processes seem to run on the master node. What do I
> have to do to prevent this from happening? So that the processes only run on
> the slave nodes.

I'm assuming you're using -8.  When running your MPI job, set NO_LOCAL=1
just like you set the NP
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011112/72dd0484/attachment.bin
From gkogan at students.uiuc.edu  Sun Nov 11 21:19:34 2001
From: gkogan at students.uiuc.edu (german kogan)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <20011112000615.B6888@blueraja.scyld.com>
Message-ID: <Pine.GSO.4.31.0111112315001.3045-100000@ux8.cso.uiuc.edu>


> >
> > Also, another question is about mpi. I have ran a simple test code on
> > the cluster, and some processes seem to run on the master node. What do I
> > have to do to prevent this from happening? So that the processes only run on
> > the slave nodes.
>
> I'm assuming you're using -8.  When running your MPI job, set NO_LOCAL=1
> just like you set the NP
>


What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set
this every time I ran mpi? The command I use for running
mpi is 'mpi -np "number of processes" ./a.out'. Where would I put
NO_LOCAL=1?

THanks


From agrajag at scyld.com  Sun Nov 11 21:41:54 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <Pine.GSO.4.31.0111112315001.3045-100000@ux8.cso.uiuc.edu>; from gkogan@students.uiuc.edu on Sun, Nov 11, 2001 at 11:19:34PM -0600
References: <20011112000615.B6888@blueraja.scyld.com> <Pine.GSO.4.31.0111112315001.3045-100000@ux8.cso.uiuc.edu>
Message-ID: <20011112004154.C6888@blueraja.scyld.com>

On Sun, 11 Nov 2001, german kogan wrote:

> What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set
> this every time I ran mpi? The command I use for running
> mpi is 'mpi -np "number of processes" ./a.out'. Where would I put
> NO_LOCAL=1?

-8 as in 27az-8, 27bz-8 or 27cz-8.

I had thought that you were starting your jobs by doing:
NP=4 ./a.out
in which case you'd do:
NP=4 NO_LOCAL=1 ./a.out

(you can replace '4' with however many processes you actually want)

As you're using mpirun, you can also do:
mpirun -np "number of processes" -nolocal ./a.out
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011112/6973f016/attachment.bin
From gkogan at students.uiuc.edu  Sun Nov 11 23:26:22 2001
From: gkogan at students.uiuc.edu (german kogan)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <20011112004154.C6888@blueraja.scyld.com>
Message-ID: <Pine.GSO.4.31.0111120119420.9-100000@ux8.cso.uiuc.edu>

On Mon, 12 Nov 2001, Sean Dilda wrote:

> On Sun, 11 Nov 2001, german kogan wrote:
>
> > What do you mean by -8? What does NO_LOCAL=1 mean and do I have to set
> > this every time I ran mpi? The command I use for running
> > mpi is 'mpi -np "number of processes" ./a.out'. Where would I put
> > NO_LOCAL=1?
>
> -8 as in 27az-8, 27bz-8 or 27cz-8.
>
> I had thought that you were starting your jobs by doing:
> NP=4 ./a.out
> in which case you'd do:
> NP=4 NO_LOCAL=1 ./a.out
>
> (you can replace '4' with however many processes you actually want)
>
> As you're using mpirun, you can also do:
> mpirun -np "number of processes" -nolocal ./a.out
>


For some reason I have two copies of mpirun one in /usr/bin/ and one in
/usr/mpi_beowulf/bin. But when  I try running some code with the copy in
/usr/mpi_beowulf/bin I get the following error "p0_4360: p4_error:
net_create_slave: host not a bproc node: -3 p4_error: latest msg from
perror: Success" but it does have all the mpi options such as -nonlocal
etc, it shows me all these when I type something like "mpirun -h". However
I can run code with the mpirun from /usr/bin. But  when I tried doing
/usr/bin.mpirun -np 4 -nolocal ./a.out I get the folowing error "Failed to
exec target program: No such file or directory". Do you have any ideas?

Thanks


From aby_sinha at yahoo.com  Mon Nov 12 01:54:28 2001
From: aby_sinha at yahoo.com (Abhishek Sinha)
Date: Wed Nov 25 01:01:52 2009
Subject: vi architecture
Message-ID: <3BD5F389001E772D@mail.san.yahoo.com> (added by postmaster@mail.san.yahoo.com)

Hello 

I had a look at Virtual Interface architecure and the idea of User Level 
networking seems good. but i m in doubt whether can use for commercial 
purposes or not. 

Might seem like a newbie question but if Vi does what it promises what could 
be  the disadvantages of using it. Anyone having experience with it ..Please 
enlighten !!

Regards
Abhishek Sinha

From jakob at unthought.net  Mon Nov 12 02:47:33 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed Nov 25 01:01:52 2009
Subject: Compile farm?
In-Reply-To: <20011108105814.C12344@wumpus.foo>; from lindahl@conservativecomputer.com on Thu, Nov 08, 2001 at 10:58:14AM -0500
References: <20011104212912.W14001@unthought.net> <20011108105814.C12344@wumpus.foo>
Message-ID: <20011112114733.B30421@unthought.net>

On Thu, Nov 08, 2001 at 10:58:14AM -0500, Greg Lindahl wrote:
> On Sun, Nov 04, 2001 at 09:29:12PM +0100, Jakob ?stergaard wrote:
> 
> > Problem is - mosix migrates jobs after a while. Initially a compiler
> > takes up a few megabytes of memory, but "after a while" it has grown
> > to hundreds of megabytes. When mosix decides to migrate the compiler
> > it will spend a long time on the netowrk to move the large process
> > image.
> 
> I've never used Mosix. Does it have the ability to set policies like
> "this binary should always be immediately migrated at exec" or "all
> processes should be migrated at exec"? You'd think it would... and
> using such policies would solve this particular problem.

Sorry for the lag  :)

I don't know if you can set "migrate on exec" - I didn't experiment that
much with it.  I did try to tell it to migrate as early as possible, but
couldn't make it do so to satisfaction...

But much has changed since then I'm sure.  I don't know if the early migration
options allow for migrate-on-exec - there would be some fundamental problems
with that too.  Mosix considers the CPU/memory requirements of the process and
migrates to the host "best suited at that time".   Mosix would have to know
about gcc, and know that it should migrate early, or (almost) never migrate.

I don't know how it would look today.

The other problem with Mosix is, that it requires you to run the same kernel on
all machines.  Now parallel compiles are usually done on a fairly homogenous
cluster, but in my situation it's not really an option to run the same kernel
revision on all machines.   I just keep my headers, libraries and tools
homogenous.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

From Lechner at drs-esg.com  Mon Nov 12 07:33:54 2001
From: Lechner at drs-esg.com (Lechner, David)
Date: Wed Nov 25 01:01:52 2009
Subject: Good network traffic visualization tools ?
Message-ID: <D6F1CB2A6FD3D211A0AD00A0C995F32001BDE036@mercury.tas.drs.com>

I am investigating the performance of a multi-component software program on
a cluster wrt various HW and network configurations -
Can anyone suggest good tools to help monitor network utilization?  I
understand that SNMP enabled switches allow doing this, but I am using
generic products now not branded products with lots of software support.  So
far I have looked at KSniffer, Cricket, Cheops, IPTraf, and NTop  - 
I am using a mix of programs and need to measure capability of distributed
programs that use direct sockets without MPI - we'd like to see the traffic
monitored "real-time" via some color-coded matrix screen that shows traffic
BW between nodes - even if it is just a table of values for all traffic
within a cluster though then that would be good enough for now - we will
snap that data into a vis. tool (an approach common to many of the tools I
mention above).
I also need Windows support as well as Linux - 

Thanks in advance -  
Dave Lechner.


From akostocker at hotmail.com  Mon Nov 12 09:24:13 2001
From: akostocker at hotmail.com (Tony Stocker)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
Message-ID: <F234KIYIpVMgXLrfvVZ00002eea@hotmail.com>

Hi there,

I'm also seeing this "issue" but on a slightly larger scale.  All of my 
slave nodes (currently at 6) have 1GB of memory yet in status monitor they 
all show 670MB used.  And the cluster isn't doing anything at all, it's just 
booted up.  What buffers and cache need to eat up this much of the memory?  
And more importantly isn't this going to affect performance when I actually 
do try to run something on the cluster?

-Tony

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


From mlrecv at yahoo.com  Mon Nov 12 09:43:23 2001
From: mlrecv at yahoo.com (Zhifeng F. Chen)
Date: Wed Nov 25 01:01:52 2009
Subject: M-VIA 1.2 problems
Message-ID: <00cc01c16ba1$87af16b0$906a7080@divine>

Hi,

    Does anyone has experience with M-VIA 1.2b2?
    I am trying to compile and install M-VIA 1.2b2 on a RedHat 7.2, 2.4.10, SMP, 1G memory system, a Intel Eepro100 NIC.
    When I compile the source, some errors come up and I fixed by following the FAQ (change gcc to kgcc; add #undef min in front of #define min in vipk_core/vipk_rmm.h, remove { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82820FW_4,PCI_ANY_ID, PCI_ANY_ID, }, from eepro100.c). The compilation was successful.
    After I installed the device drivers (via_lo.o, via_eth0.o), the /dev/*, modprobe via_lo via_eth0), I tried to test them by using vnettest /dev/via_lo r localhost, and vnettest /dev/via_lo s localhost on the same machine. The machine becomes dead completely.
    Anyone has idea or comments of what is happening?

ZF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011112/58672e7f/attachment.html
From akostocker at hotmail.com  Mon Nov 12 09:24:13 2001
From: akostocker at hotmail.com (Tony Stocker)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
Message-ID: <F234KIYIpVMgXLrfvVZ00002eea@hotmail.com>

Hi there,

I'm also seeing this "issue" but on a slightly larger scale.  All of my 
slave nodes (currently at 6) have 1GB of memory yet in status monitor they 
all show 670MB used.  And the cluster isn't doing anything at all, it's just

booted up.  What buffers and cache need to eat up this much of the memory?  
And more importantly isn't this going to affect performance when I actually 
do try to run something on the cluster?

-Tony

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

From mlrecv at yahoo.com  Mon Nov 12 13:43:04 2001
From: mlrecv at yahoo.com (Zhifeng F. Chen)
Date: Wed Nov 25 01:01:52 2009
Subject: Compilation problem.
Message-ID: <01b501c16bc3$02c6c3e0$906a7080@divine>

Hi,

    When compiling mvich-1.0a6.1 under mpich-1.2.2.3, 

  ./configure --with-device=via --with-arch=LINUX --without-romio  -cflags="-DUSE_STDARG -O2 -DCPU_X86 -DNIC_GIGANET -DVIPL095" -lib="-lgnivipl -lpthread"
 
  is fine.

   When I came to make, it reports:
cc1: warnings being treated as errors
queue.c: In function `MPID_Search_unexpected_for_request':
queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP'
make[3]: *** [queue.o] Error 1
Exit status from make was 2
make[2]: *** [mpilib] Error 1
make[1]: *** [mpi-modules] Error 2
make: *** [mpi] Error 2

   Can anyone help me out?

ZF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011112/2c787f39/attachment.html
From hanzl at noel.feld.cvut.cz  Tue Nov 13 00:21:32 2001
From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz)
Date: Wed Nov 25 01:01:52 2009
Subject: DUPLICATE MESSAGES ON THIS LIST - Compaq makes mess again
Message-ID: <20011113092132B.hanzl@unknown-domain>

Once again, misconfigured mailer at Compaq site sends list messages
back to this list (and most likely to ANY list) and makes them to look
like duplicates from the original author.

Now it comes via zmamail04.zma.compaq.com.

(Last time it came via zcamail4.zca.compaq.com and zcamail5.zca.compaq.com)

If you think somebody is mad, it is not you or the message author, it
is Compaq postmaster. If they do have any.

Regards

Vaclav

> Subject: RE: beowulf.org list problems
> From: Peter Bowen <pzb@scyld.com>
> To: hanzl@noel.feld.cvut.cz
> Cc: Nathalie.Viollet@Compaq.com
> Date: 01 Nov 2001 08:57:49 -0500
> X-Mailer: Evolution/0.16.99+cvs.2001.10.31.15.22 (Preview Release)
> 
> I have marked all messages from zcamail[0-9]*.zca.compaq.com for hand
> moderation.  I expect that this mailer problem will be fixed, but if I
> see messages caught on or after Nov 8, I will disable all offending
> mailing list memberships.
> 
> I do not like taking harsh measures, but clearly this is causing
> problems for many people, and is caused by a simple server
> misconfiguration.
> 
> Thanks.
> Peter
> 
> On Thu, 2001-11-01 at 05:27, hanzl@noel.feld.cvut.cz wrote:
> > Peter,
> > 
> > > If I see this re-occur, I will take appropriate measures
> > 
> > It re-occured. Please blacklist them, those Compaq guys are
> > unable to mend their mailer. Beowulf list is unusable with this mess
> > included.
> > 
> > Thanks
> > 
> > Vaclav
> > 
> > -------------------------------
> > 
> > 
> > > Subject: RE: beowulf.org list problems
> > > From: Peter Bowen <pzb@scyld.com>
> > > To: hanzl@noel.feld.cvut.cz
> > > Cc: Nathalie.Viollet@Compaq.com
> > > Date: 29 Oct 2001 10:13:04 -0500
> > > X-Mailer: Evolution/0.15.99 (Preview Release)
> > > 
> > > I am no longer seeing duplicate posts on the list, and, therefore, will
> > > not be blacklisting zcamail?.zca.compaq.com from beowulf.org lists.  If
> > > I see this re-occur, I will take appropriate measures at that time.
> > > 
> > > Thanks.
> > > Peter
> > > 
> > > On Mon, 2001-10-29 at 04:29, hanzl@noel.feld.cvut.cz wrote:
> > > > Hi Nathalie,
> > > > 
> > > > thanks for swift reaction.
> > > > 
> > > > > But I did not know that my e-mail were forwarded many time
> > > > 
> > > > Your mail was OK, it were messages of OTHER people which got repeated
> > > > by your site (and made to look like repeated mail).
> > > > 
> > > > You will also find (as everybody else subscribed to beowulf) those copies
> > > > in your personal mailbox (if you kept beowulf messages delivered to
> > > > you). Last repeated message I got was from Ron Chen (Subject: Re:
> > > > [PBS-USERS] SC2001 technical papers online).
> > > > 
> > > > Second copy of this message went this way:
> > > > 
> > > >  Received: from zcamail04.zca.compaq.com (zcamail04.zca.compaq.com [161.114.32.104])
> > > > 	by blueraja.scyld.com (8.11.6/8.11.6) with ESMTP id f9T27l032336
> > > > 	for <beowulf@beowulf.org>; Sun, 28 Oct 2001 21:07:47 -0500
> > > >  Received: by zcamail04.zca.compaq.com (Postfix, from userid 12345)
> > > > 	id 5AB25F9A; Sun, 28 Oct 2001 18:10:34 -0800 (PST)
> > > >  Received: from excmun-gh01.dem.cpqcorp.net (excmun-gh01.dem.cpqcorp.net [16.41.88.60])
> > > > 	by zcamail04.zca.compaq.com (Postfix) with ESMTP
> > > > 	id 7D447E4D; Sun, 28 Oct 2001 18:10:32 -0800 (PST)
> > > >  Received: by excmun-gh01.dem.cpqcorp.net with Internet Mail Service (5.5.2650.21)
> > > > 	id <VQTWWW1H>; Mon, 29 Oct 2001 03:07:36 +0100
> > > > 
> > > > Here is a list of messages like this (first is your regular post,
> > > > others are probably copies of other people's mails):
> > > > 
> > > >  ~/Mail/beowulf>for x in `fgrep -l .zca.compaq.com *`; do fgrep Subject $x; done
> > > > 
> > > >  Subject: 2.4 network booted kernel
> > > >  Subject: Re: good commodity NIC
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: good commodity NIC
> > > >  Subject: Core files under mpich, p4 device
> > > >  Subject: Re: good commodity NIC
> > > >  Subject: good commodity NIC
> > > >  Subject: good commodity NIC
> > > >  Subject: Add host in PVM
> > > >  Subject: Killer SCSI 1 TB fileserver
> > > >  Subject: Core files under mpich, p4 device
> > > >  Subject: Re: PBS x SGE comparison?
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: SGE and Scyld
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Re: Core files under mpich, p4 device
> > > >  Subject: Re: FNN vs GigabitEther & Myrinet
> > > >  Subject: good commodity NIC
> > > >  Subject: Re: PBS x SGE comparison?
> > > >  Subject: Re: using mpich 1.2.2.2 with ifc
> > > >  Subject: using mpich 1.2.2.2 with ifc
> > > >  Subject: Killer SCSI 1 TB fileserver
> > > >  Subject: Failed to mount using Scyld
> > > >  Subject: Re: good commodity NIC
> > > >  Subject: good commodity NIC
> > > >  Subject: RE: using mpich 1.2.2.2 with ifc
> > > >  Subject: Add host in PVM
> > > >  Subject: Re: [PBS-USERS] SC2001 technical papers online
> > > >  
> > > > Please forward this additional explanation to your site adminsitrator
> > > > (unless (s)he already knows for sure what was going on).
> > > > 
> > > > Regards
> > > > 
> > > > Vaclav
> > 
> > 
> 
> 
> 

From P.Waltner at science-computing.de  Tue Nov 13 04:06:39 2001
From: P.Waltner at science-computing.de (Peter Waltner)
Date: Wed Nov 25 01:01:52 2009
Subject: NFSv3-Client bug with large files in kernel-2.2.19-13.beo
Message-ID: <200111131206.NAA0000015404@trantor.science-computing.de>

Large file support works only partially with NFSv3 with the Beowulf- 
kernel-2.2.19-13.beo. I can write a file > 2GB over NFS, but I can only read 
the first 2 GB of the file over NFS. Also ls shows the wrong file size in NFS 
directories
local:
waltnepe@ce05sl14 /home/waltnepe > ls -lh /scr/ce05sl14/scr1/waltnepe/
-rwxr-x---    1 waltnepe admin        2.1G Nov 13 10:51 Bonnie.12536
NFS:
waltnepe@ce05sl16 /home/waltnepe > ls -lh /scr/ce05sl14/scr1/waltnepe/
-rwxr-x---    1 waltnepe admin        2.0G Nov 13 10:51 Bonnie.12536
I checked this with Linux and Irix NFS servers. 

Peter

From jnellis at dslextreme.com  Tue Nov 13 15:21:40 2001
From: jnellis at dslextreme.com (Joe Nellis)
Date: Wed Nov 25 01:01:52 2009
Subject: Upgrading to 27bz-8
Message-ID: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com>

Greetings,

We would like to install this new version of Scyld software and we are currently running  -7 version.  When we originally installed -7 our nodes didn't have floppies or cdroms so we had to crack each case and hook up a floppy to get it booted once. This took considerable time.  Once all the nodes were booted, we moved the boot image to each node's individual harddisk.  Now I am wondering how we can avoid this again.  If we install -8 onto our master node will the nodes come up in enough of a condition with their -7 bootimage to rewrite a new boot image to their harddisks?  Otherwise I am assuming I will need a -8 boot image to write to the node disks before I even install -8 on the master.  I hope this isn't confusing.

thanks,
Joe Nellis
jnellis@dslextreme.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011113/1ebec38a/attachment.html
From jlong at arsc.edu  Tue Nov 13 15:37:13 2001
From: jlong at arsc.edu (James Long)
Date: Wed Nov 25 01:01:52 2009
Subject: Beowwulf Status Monitor on Scyld
In-Reply-To: <F234KIYIpVMgXLrfvVZ00002eea@hotmail.com>
References: <F234KIYIpVMgXLrfvVZ00002eea@hotmail.com>
Message-ID: <p04330103b8175edaf339@[199.165.84.194]>

Free memory is wasted memory. It is used as a disk cache until needed 
by an application, at which time the disk cache is reduced.

Jim


At 5:24 PM +0000 11/12/01, Tony Stocker wrote:
>Hi there,
>
>I'm also seeing this "issue" but on a slightly larger scale.  All of my
>slave nodes (currently at 6) have 1GB of memory yet in status monitor they
>all show 670MB used.  And the cluster isn't doing anything at all, it's just
>
>booted up.  What buffers and cache need to eat up this much of the memory? 
>And more importantly isn't this going to affect performance when I actually
>do try to run something on the cluster?
>
>-Tony
>
>_________________________________________________________________
>Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
>
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

-- 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
James Long
MPP Specialist
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks, AK 99775-6020
jlong@arsc.edu
(907) 474-5731 work
(907) 474-5494 fax
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

From hanzl at noel.feld.cvut.cz  Wed Nov 14 03:09:55 2001
From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz)
Date: Wed Nov 25 01:01:52 2009
Subject: Upgrading to 27bz-8
In-Reply-To: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com>
References: <000e01c16c99$f374d1c0$73f2a540@dslextreme.com>
Message-ID: <20011114120955L.hanzl@unknown-domain>

I guess once you ever managed to beoboot your nodes from harddisks
(you have working beoboot partition type 89), you should never have to
touch your nodes again, they should boot from newly installed master
as well. (Unless you change network card on slave node,
e.g. installing 3c905C might cause you problems.)

Regards

Vaclav

> From: "Joe Nellis" <jnellis@dslextreme.com>
> 
> Greetings,
> 
> We would like to install this new version of Scyld software and we are
> currently running -7 version.  When we originally installed -7 our
> nodes didn't have floppies or cdroms so we had to crack each case and
> hook up a floppy to get it booted once. This took considerable time.
> Once all the nodes were booted, we moved the boot image to each node's
> individual harddisk.  Now I am wondering how we can avoid this again.
> If we install -8 onto our master node will the nodes come up in enough
> of a condition with their -7 bootimage to rewrite a new boot image to
> their harddisks?  Otherwise I am assuming I will need a -8 boot image
> to write to the node disks before I even install -8 on the master.  I
> hope this isn't confusing.
> 
> thanks,
> Joe Nellis
> jnellis@dslextreme.com

From John.ws.Strange at marconi.com  Wed Nov 14 05:58:26 2001
From: John.ws.Strange at marconi.com (Strange, John)
Date: Wed Nov 25 01:01:52 2009
Subject: Compile farm?
Message-ID: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com>

Well you can use mexec with mosix to get things to work, and it does work
quite well but it doesn't scale because of some underlying filesystem
problems we are having.

I've got 25 machines, our backend storage currently is netapp filers, so
using NFS I have to turn off client side caching.  It basically crushes the
filer doing constant file handling lookups.  I'm still playing with a netapp
that we have on spare, maybe I'll have some luck with finding away around
the problems that we are having.

There is no really good backend filesystem that you can use, maybe GFS but
it's still relatively new and too bleeding edge for pratical use. (IMHO)
Plus we don't have the hardware for it fiber channel and we have *NO*
budget.

If anyone has any suggestions I would glad to hear them.

Thanks,

John Strange
Marconi
john.ws.strange.at.marconi.com

-----Original Message-----
From: Scott Thomason [mailto:SThomaso@phmining.com]
Sent: Friday, November 02, 2001 2:25 PM
To: Beowulf@beowulf.org
Subject: Compile farm?


Greetings. I'm interested in setting up a shell account/batch
process/compile farm system for our developers, and I'm wondering if Beowulf
clusters are well suited to that task. We're not interested in writing
parallel code using PVM or MPI, we just want to log into what appears to be
one big server and have it dispatch the workload amongst the slave
processors. Is Beowulf good at that?
---scott

p.s. Sorry if there are duplicates of this message; I used the wrong email
address earlier.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

From hanzl at noel.feld.cvut.cz  Wed Nov 14 08:49:09 2001
From: hanzl at noel.feld.cvut.cz (hanzl@noel.feld.cvut.cz)
Date: Wed Nov 25 01:01:52 2009
Subject: NFS export from clients?
Message-ID: <20011114174909J.hanzl@unknown-domain>

Did anybody manage to NFS export disks from Scyld nodes?

I suppose I should start NFS daemons from node_up script and re-export
it out of the cluster using unfsd. (I would also like to mount node
harddisks using autofs to maybe avoid fsck after node crash.)

I do not want PVFS, I want individual independent filesystems on nodes
(there are several large data sets and often just one of them is used,
other nodes may even be off).

I will try, but if anybody already has some experience with this, I
would be happy to hear from him.

Thanks

Vaclav

From raij at cs.unc.edu  Wed Nov 14 15:10:16 2001
From: raij at cs.unc.edu (Andrew B. Raij)
Date: Wed Nov 25 01:01:52 2009
Subject: Public slaves
Message-ID: <Pine.GSO.4.10.10111141759280.14268-100000@capefear.cs.unc.edu>

Hi everybody,

I'd like to set up a scyld cluster with slaves open to the public
network.  I'd also like each slave to get the same ip of my choosing every
time it is booted and slave ips shouldn't have to be confined to any
specific range.  I understand that doing this is contradictory to the
beowulf design but is it possible?  

thanks,
-Andrew


From ron_chen_123 at yahoo.com  Wed Nov 14 21:43:54 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:52 2009
Subject: Fwd: [Globus-discuss] Globus Toolkit in the news
Message-ID: <20011115054354.76538.qmail@web14704.mail.yahoo.com>

Entropia's distributed computing used in grid
computing.

 -Ron

--- Ian Foster <foster@mcs.anl.gov> wrote:
> Date: Wed, 14 Nov 2001 15:23:58 -0600
> To: discuss@globus.org, management@globus.org
> From: Ian Foster <foster@mcs.anl.gov>
> Subject: [Globus-discuss] Globus Toolkit in the news
> 
> http://news.cnet.com/news/0-1003-200-7849355.html
> 
> 
> 
>
_______________________________________________________________
> Ian 
> Foster                                     
> http://www.mcs.anl.gov/~foster 
> 
> Math & Computer Science Div.            Dept of
> Computer Science
> Argonne National Laboratory             The
> University of Chicago
> Argonne, IL 60439, U.S.A.               Chicago, IL
> 60637, U.S.A.
> 630 252 4619 (fax 5986)                  773 702
> 3487 (fax 8487)
> 


__________________________________________________
Do You Yahoo!?
Find the one for you at Yahoo! Personals
http://personals.yahoo.com

From bdorland at kendall.umd.edu  Thu Nov 15 07:48:08 2001
From: bdorland at kendall.umd.edu (Bill Dorland)
Date: Wed Nov 25 01:01:52 2009
Subject: Scyld 27bz-8 problem (symptom: netstat)
Message-ID: <200111151548.fAFFm8K20412@kendall.umd.edu>

I recently purchased the $2.95 copy (version 27bz-8) of Scyld and have
experienced some difficulties with the installation.  Before putting
together a long post, I'd like to know if anyone else has successfully
performed a diskless installation of 

Scyle Beowulf 
"Label Side Up" 
Edition

Copyright 2001 Scyld Computing Corp.
P/N: 27BZ-8

If so, I am curious whether anyone else has experienced an incorrect
response from the command 'netstat -avupt' when executed as root.  I
find that the system does not believe root is root.

I have not connected my cluster to the internet, and I installed to a
new, blank hard drive.  No other software has been introduced.

In a completely unrelated incident, another system that I work on was
compromised some time back by a rootkit which exploited a
vulnerability in SSH, and interestingly, one of the early symptoms of
trouble on that system was this same thing: root execution of 'netstat
-avupt' complained that root was not root.

The version of SSH that is shipped with 27bz-8 is in fact vulnerable
to the attack that I experienced on this unrelated system.

I am therefore concerned that something that is admittedly quite
unlikely might have happened, i.e., that the 27bz-8 distribution was
shipped despite having been compromised in some way.  I would be very
happy to hear from anyone that can assure me that this is not the case
by providing some explanation for the odd netstat behavior.  

In the meantime, I have spent several days tracking down this problem
and will continue to do so.  Since the openssh rpm's shipped with
Scyld are modified to be compatible with LFS (not to mention the
kernel and so on), I cannot trivially recover from this problem, if it
is indeed a problem.  Also, I cannot find any patches or updates to
the 27bz-8 release on line.

This is my first post to a public list-server.  I apologize in advance
for any breach of netiquette.

                                                         --Bill

From wrankin at ee.duke.edu  Thu Nov 15 09:35:16 2001
From: wrankin at ee.duke.edu (William T. Rankin)
Date: Wed Nov 25 01:01:52 2009
Subject: Public slaves 
In-Reply-To: <200111151701.fAFH15029660@blueraja.scyld.com>
Message-ID: <Pine.SOL.4.21.0111151228460.4674-100000@ece.ee.duke.edu>

> From: "Andrew B. Raij" <raij@cs.unc.edu>
> 
> Hi everybody,
> 
> I'd like to set up a scyld cluster with slaves open to the public
> network.  I'd also like each slave to get the same ip of my choosing every
> time it is booted and slave ips shouldn't have to be confined to any
> specific range.  I understand that doing this is contradictory to the
> beowulf design but is it possible?  

What you are talking about is to set up all the nodes as general
purpose workstations and using them as a cluster.  This isn't
"contrary" to the beowulf design (that's how my first cluster was
set up).  It is contrary IIRC to the basic Scyld assumptions.

Have you considered just using kickstart with a standard linux
distribution to configure your machines?  Or is there something
specific to Scyld that you are interested in?

-bill


From Nirmal.Bissonauth at durham.ac.uk  Thu Nov 15 09:49:24 2001
From: Nirmal.Bissonauth at durham.ac.uk (Nirmal Bissonauth)
Date: Wed Nov 25 01:01:52 2009
Subject: Tyan Tunder K7 and Gigabit Ethernet cards
Message-ID: <Pine.GSO.3.95-960729.1011115144709.11397A-100000@altair.dur.ac.uk>

Hi all,

I would like to know if people have been successful in using gigabit
ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462)
This has two built-in 3com 100 Base T cards.

I have tried to use a DLINK DGE-550T card with the latter but without much
success. Even after disabling the onboard NICs, the card did not work
properly. The problem is that an interrupt is not set after a DMA
transmitt (something to do with the APIC I presume). I tried linux kernel
2.4.12-ac3 with the latest driver from Dlinks website, but that did not
make much difference either. I have six of these.

The cards that I am particularly interested to hear about are the
3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T)
Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T)
NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T)
Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T)
SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX)

Or any other cheap gigabit network cards.

Regards
Nirmal 

-----------------------------------------------------------------------
Nirmal Bissonauth                 email: nirmal.bissonauth@durham.ac.uk
University of Durham                      www: http://aig-www.dur.ac.uk
-----------------------------------------------------------------------


From joelja at darkwing.uoregon.edu  Thu Nov 15 10:35:27 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed Nov 25 01:01:52 2009
Subject: Tyan Tunder K7 and Gigabit Ethernet cards
In-Reply-To: <Pine.GSO.3.95-960729.1011115144709.11397A-100000@altair.dur.ac.uk>
Message-ID: <Pine.LNX.4.33.0111151030330.28958-100000@twin.uoregon.edu>

I'd recomend trying the latest natsemi dp8381x driver (1.11) - from Donald 
Becker. it's on the scyld website, or in the recent kernel sources.

joelja

On Thu, 15 Nov 2001, Nirmal Bissonauth wrote:

> Hi all,
> 
> I would like to know if people have been successful in using gigabit
> ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462)
> This has two built-in 3com 100 Base T cards.
> 
> I have tried to use a DLINK DGE-550T card with the latter but without much
> success. Even after disabling the onboard NICs, the card did not work
> properly. The problem is that an interrupt is not set after a DMA
> transmitt (something to do with the APIC I presume). I tried linux kernel
> 2.4.12-ac3 with the latest driver from Dlinks website, but that did not
> make much difference either. I have six of these.
> 
> The cards that I am particularly interested to hear about are the
> 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T)
> Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T)
> NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T)
> Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T)
> SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX)
> 
> Or any other cheap gigabit network cards.
> 
> Regards
> Nirmal 
> 
> -----------------------------------------------------------------------
> Nirmal Bissonauth                 email: nirmal.bissonauth@durham.ac.uk
> University of Durham                      www: http://aig-www.dur.ac.uk
> -----------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli				       joelja@darkwing.uoregon.edu    
Academic User Services			     consult@gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From math at velocet.ca  Thu Nov 15 10:44:34 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:52 2009
Subject: Tyan Thunder K7 and Gigabit Ethernet cards
In-Reply-To: <Pine.GSO.3.95-960729.1011115144709.11397A-100000@altair.dur.ac.uk>; from Nirmal.Bissonauth@durham.ac.uk on Thu, Nov 15, 2001 at 05:49:24PM +0000
References: <Pine.GSO.3.95-960729.1011115144709.11397A-100000@altair.dur.ac.uk>
Message-ID: <20011115134434.Y66460@velocet.ca>

On Thu, Nov 15, 2001 at 05:49:24PM +0000, Nirmal Bissonauth's all...
> Hi all,
> 
> I would like to know if people have been successful in using gigabit
> ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462)
> This has two built-in 3com 100 Base T cards.
> 
> I have tried to use a DLINK DGE-550T card with the latter but without much
> success. Even after disabling the onboard NICs, the card did not work
> properly. The problem is that an interrupt is not set after a DMA
> transmitt (something to do with the APIC I presume). I tried linux kernel
> 2.4.12-ac3 with the latest driver from Dlinks website, but that did not
> make much difference either. I have six of these.

I have a single DGE-500T sitting on a Tyan Tiger talking to an SMC card also
based on the NS82830 chipset. No problems sending data EXCEPT in freebsd.
Soon after alot of data flows bi directionally, the card drops the carrier.
(in fact, any NS82830 card has the same problem.) Or at least the OS tells me
that. I think there's a problem with the FBSD nge driver, but that might be
fixed (I think Im on 4.3 here). With a linux 2.4.13 kernel (cant remember if I
dumped appropriate -ac patches in or not - I usually do) with the appropriate
gbe drivers compiled in I had no problems with dropped carriers except on one
card that was finicky with everything.

But otherwise I've had no problems. You on linux or fbsd?

(How many other people use FBSD for clustering?)

> The cards that I am particularly interested to hear about are the
> 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T)
> Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T)
> NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T)
> Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T)
> SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX)

Got a mix and match here of GBE cards here - SMC 9452 (not -62), couple
of ARKs and a Linksys, all based on the 82830 card.

Only one did not work - it would drop carrier as soon as any amount of
data went thru it, but then again Im not using Cat6e here, just Cat5+
cables which worked between all other pairs of cards of my 6 here. 

I find relatively low system/interupt time spent on the 82830 cards, like
2% cpu for sending about 150-200Mbps down the wire and receiving the same
at the same time with avg size 1K packets (gromacs 3.0 d.dppc benchmark.)

Here's an interesting thing that I came across on the link on /. today
regarding FBSD vs LINUX:


http://www.byte.com/documents/s=1794/byt20011107s0001/1112_moshe.html

    On the Linux side, I attached all interrupts coming from the network
    adaptor to one CPU. With the new TCP/IP stack in the 2.4 kernels this
    really becomes necessary. Otherwise, you might find the incoming packets
    arranged out of order, because later interrupts are serviced (on another
    CPU) before earlier ones, thus requiring a reordering further down the
    handling layers.

Can freebsd do that; sounds like a way to ensure further efficiency.

/kc

> 
> Or any other cheap gigabit network cards.
> 
> Regards
> Nirmal 
> 
> -----------------------------------------------------------------------
> Nirmal Bissonauth                 email: nirmal.bissonauth@durham.ac.uk
> University of Durham                      www: http://aig-www.dur.ac.uk
> -----------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From mas at ucla.edu  Thu Nov 15 10:59:06 2001
From: mas at ucla.edu (Michael Stein)
Date: Wed Nov 25 01:01:52 2009
Subject: Scyld 27bz-8 problem (symptom: netstat)
In-Reply-To: <200111151548.fAFFm8K20412@kendall.umd.edu>; from Bill Dorland on Thu, Nov 15, 2001 at 10:48:08AM -0500
References: <200111151548.fAFFm8K20412@kendall.umd.edu>
Message-ID: <20011115105906.A22242@mas1.oac.ucla.edu>

> If so, I am curious whether anyone else has experienced an incorrect
> response from the command 'netstat -avupt' when executed as root.  I
> find that the system does not believe root is root.

I see this on several systems.  One behind a firewall and I'd guess
never attacked (the firewall doesn't allow inbound anything even ssh).

This is a RH 7.0 system with kernel 2.2.16-22.

netstat is from net-tools-1.56 (RH 7.0).

I suspect it's just a partially built internal file control block of
some sort in the kernel.  Find the process id for "[mdrecoveryd]",
cd to /proc/<processid> and then try to ls the fd directory.

I traced it this far by running a recompiled (COPTS=-g) netstat under gdb
as root with a breakpoint in netstat.c function prg_cache_load where
the variable eacces gets set to 1.  Futher tracing would probably have
to be in the kernel.


From raij at cs.unc.edu  Thu Nov 15 11:01:05 2001
From: raij at cs.unc.edu (Andrew B. Raij)
Date: Wed Nov 25 01:01:52 2009
Subject: Public slaves 
In-Reply-To: <Pine.SOL.4.21.0111151228460.4674-100000@ece.ee.duke.edu>
Message-ID: <Pine.GSO.4.10.10111151356280.6035-100000@capefear.cs.unc.edu>

I've heard much about scyld's cluster management tools, so  I thought it
made senes to stick with scyld and modify things to fit my situation.  If
I were to use kickstart and a standard linux distro, what would I be
losing from scyld?

-Andrew

On Thu, 15 Nov 2001, William T. Rankin wrote:

> > From: "Andrew B. Raij" <raij@cs.unc.edu>
> > 
> > Hi everybody,
> > 
> > I'd like to set up a scyld cluster with slaves open to the public
> > network.  I'd also like each slave to get the same ip of my choosing every
> > time it is booted and slave ips shouldn't have to be confined to any
> > specific range.  I understand that doing this is contradictory to the
> > beowulf design but is it possible?  
> 
> What you are talking about is to set up all the nodes as general
> purpose workstations and using them as a cluster.  This isn't
> "contrary" to the beowulf design (that's how my first cluster was
> set up).  It is contrary IIRC to the basic Scyld assumptions.
> 
> Have you considered just using kickstart with a standard linux
> distribution to configure your machines?  Or is there something
> specific to Scyld that you are interested in?
> 
> -bill
> 
> 


From jgl at unix.shell.com  Thu Nov 15 11:22:10 2001
From: jgl at unix.shell.com (J. G. LaBounty)
Date: Wed Nov 25 01:01:52 2009
Subject: Tyan Tunder K7 and Gigabit Ethernet cards 
In-Reply-To: Your message of Thu, 15 Nov 2001 17:49:24 +0000.
             <Pine.GSO.3.95-960729.1011115144709.11397A-100000@altair.dur.ac.uk> 
Message-ID: <200111151922.NAA08339@volta.shell.com>

We are using the Intel PRO/1000T adapter on a Tyan S2460 but you have to
build the driver which Intel supplies on their web site. This makes it
a pain to install but once you have they perform well. The kernel we
are using is 2.4.13-ac5.

> From: Nirmal Bissonauth <Nirmal.Bissonauth@durham.ac.uk>
> Reply-To: Nirmal Bissonauth <Nirmal.Bissonauth@durham.ac.uk>
> To: beowulf@beowulf.org
> Subject: Tyan Tunder K7 and Gigabit Ethernet cards
> Date: Thu, 15 Nov 2001 17:49:24 +0000 (GMT)
> 
> Hi all,
> 
> I would like to know if people have been successful in using gigabit
> ethernet(over copper) cards with the Tyan Thunder K7 motherboard. (s2462)
> This has two built-in 3com 100 Base T cards.
> 
> I have tried to use a DLINK DGE-550T card with the latter but without much
> success. Even after disabling the onboard NICs, the card did not work
> properly. The problem is that an interrupt is not set after a DMA
> transmitt (something to do with the APIC I presume). I tried linux kernel
> 2.4.12-ac3 with the latest driver from Dlinks website, but that did not
> make much difference either. I have six of these.
> 
> The cards that I am particularly interested to hear about are the
> 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T)
> Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T)
> NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T)
> Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T)
> SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX)
> 
> Or any other cheap gigabit network cards.
> 
> Regards
> Nirmal 
> 
> -----------------------------------------------------------------------
> Nirmal Bissonauth                     email: nirmal.bissonauth@durham.ac.uk
> University of Durham                            www: http://aig-www.dur.ac.uk
> -----------------------------------------------------------------------
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

From Peter.Lindgren at experian.com  Thu Nov 15 11:54:34 2001
From: Peter.Lindgren at experian.com (Peter Lindgren)
Date: Wed Nov 25 01:01:52 2009
Subject: Comparison of clustering systems?
Message-ID: <sbf3c936.095@sch.experian.com>

Top500 has a poll asking "what cluster system do you use?". They offer the choices:

Oscar
Scyld
Score
SCE
NPACI Rocks
MSC.Linux
Other

Follow this link to see current results:
http://clusters.top500.org/pollbooth.php?qid=clustersys&aid=-1

So far I've actually installed and gotten my application running with Scyld. I have most of the other systems on CD. I was about to try Rocks. But to try them all is going to take me a long time....

I wonder whether anyone has recently done a comparison of these (or other) systems? (I also know of  Cplant and IBM's CSK.) I found an NHSE article from 1996, but that's ancient history.


Peter Lindgren
Phone: 847 944 4515
Fax: 847 517 5889
E-mail: peter.lindgren@experian.com


From SGaudet at turbotekcomputer.com  Thu Nov 15 12:38:35 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:52 2009
Subject: Tyan Tunder K7 and Gigabit Ethernet cards
Message-ID: <3450CC8673CFD411A24700105A618BD6170F5B@911TURBO>

Hello,

> I would like to know if people have been successful in using gigabit
> ethernet(over copper) cards with the Tyan Thunder K7 
> motherboard. (s2462)
> This has two built-in 3com 100 Base T cards.
> 
> I have tried to use a DLINK DGE-550T card with the latter but 
> without much
> success. Even after disabling the onboard NICs, the card did not work
> properly. The problem is that an interrupt is not set after a DMA
> transmitt (something to do with the APIC I presume). I tried 
> linux kernel
> 2.4.12-ac3 with the latest driver from Dlinks website, but 
> that did not
> make much difference either. I have six of these.
> 
> The cards that I am particularly interested to hear about are the
> 3com Gigabit Network card (3c996-T) or (3c996B-T) or (3c1000-T)
> Intel PRO/1000 Gigabit Server 1000B-T PCI Adapter (PWLA8490T)
> NETGEAR 100/1000Base-T Copper Gigabit Ethernet Adapter (GA-623T)
> Netgear 100/1000BASET 32/64 BIT PCI Gigabit Adapter (GA-622T)
> SMC TigerCard 1000BaseTX 32/64Bit PCI Gb Ethernet Nic (SMC9462TX)
> 
> Or any other cheap gigabit network cards.

<snip>

> --------------------------------------------------------------

Ever look at Syskonnect?
http://www.syskonnect.com/syskonnect/products/sk-98xx.htm 

Had very good luck with them, and they are based in Europe.

Cheers,


Steve Gaudet 
Linux Sales Engineer
   ..... 
  <(???)> 

 
============================================================================

  |  Turbotek Computer Corp.            tel:603-666-3062 ext. 21
|
  |  8025 South Willow St.              fax:603-666-4519
|
  |  Manchester, NH 03103               e-mail:sgaudet@turbotekcomputer.com
|
  |  toll free:800-573-5393             web: http://www.turbotekcomputer.com
|
 
============================================================================
  

From opengeometry at yahoo.ca  Thu Nov 15 13:11:10 2001
From: opengeometry at yahoo.ca (William Park)
Date: Wed Nov 25 01:01:52 2009
Subject: Comparison of clustering systems?
In-Reply-To: <sbf3c936.095@sch.experian.com>; from Peter.Lindgren@experian.com on Thu, Nov 15, 2001 at 01:54:34PM -0600
References: <sbf3c936.095@sch.experian.com>
Message-ID: <20011115161110.A1377@node0.opengeometry.ca>

On Thu, Nov 15, 2001 at 01:54:34PM -0600, Peter Lindgren wrote:
> Top500 has a poll asking "what cluster system do you use?". They offer
> the choices:
> 
> Oscar
> Scyld
> Score
> SCE
> NPACI Rocks
> MSC.Linux
> Other

What.. no Mosix?

-- 
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>.
8 CPU cluster, NAS, (Slackware) Linux, Python, LaTeX, Vim, Mutt, Tin

From france at handhelds.org  Thu Nov 15 15:57:43 2001
From: france at handhelds.org (George France)
Date: Wed Nov 25 01:01:52 2009
Subject: Compile farm?
In-Reply-To: <sbe29ecb.029@HCNA01>
References: <sbe29ecb.029@HCNA01>
Message-ID: <01111518574301.18513@shadowfax.middleearth>

Greetings,

Install pvm, there are patches for 'gnu make' to use pvm, then just do a 
"make -j <num processors in your cluster>, simple, easy and it works for me 
on i686, alpha and the ARM arch.

Best Regards,


--George

On Friday 02 November 2001 14:25, Scott Thomason wrote:
> Greetings. I'm interested in setting up a shell account/batch
> process/compile farm system for our developers, and I'm wondering if
> Beowulf clusters are well suited to that task. We're not interested in
> writing parallel code using PVM or MPI, we just want to log into what
> appears to be one big server and have it dispatch the workload amongst the
> slave processors. Is Beowulf good at that? ---scott
>
> p.s. Sorry if there are duplicates of this message; I used the wrong email
> address earlier.
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

From erayo at cs.bilkent.edu.tr  Fri Nov 16 04:22:00 2001
From: erayo at cs.bilkent.edu.tr (Eray Ozkural (exa))
Date: Wed Nov 25 01:01:52 2009
Subject: Comparison of clustering systems?
In-Reply-To: <20011115161110.A1377@node0.opengeometry.ca>
References: <sbf3c936.095@sch.experian.com> <20011115161110.A1377@node0.opengeometry.ca>
Message-ID: <E164i0Q-0004mp-00@orion.exa.homeip.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thursday 15 November 2001 23:11, William Park wrote:
> >
> > Oscar
> > Scyld
> > Score
> > SCE
> > NPACI Rocks
> > MSC.Linux
> > Other
>
> What.. no Mosix?

Perhaps they're talking about HPC solutions only.

- -- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE79QTofAeuFodNU5wRAoEIAJ9vHp/aSKACsRnJg4GFL8a/N/P+GgCfQZU5
WLAp1PsFnnwMPndg4lX5UwY=
=/AGJ
-----END PGP SIGNATURE-----

From rgb at phy.duke.edu  Fri Nov 16 05:45:38 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:52 2009
Subject: Public slaves 
In-Reply-To: <Pine.GSO.4.10.10111151356280.6035-100000@capefear.cs.unc.edu>
Message-ID: <Pine.LNX.4.33.0111160804490.14529-100000@ganesh.phy.duke.edu>

On Thu, 15 Nov 2001, Andrew B. Raij wrote:

> I've heard much about scyld's cluster management tools, so  I thought it
> made senes to stick with scyld and modify things to fit my situation.  If
> I were to use kickstart and a standard linux distro, what would I be
> losing from scyld?

A better way to put it would be what do you need from scyld on your
cluster?  As you say, scyld has cluster management tools and so forth,
but clusters existed for years before scyld and it isn't too hard to set
up a cluster without it.  Indeed, if your cluster is intended to be a
compute farm where you would like folks to be able to log into nodes one
at a time by name to do work (which seems quite possible if you want to
give explicit nodes permanent names) then scyld is quite possibly not
your best bet as it follows the "true beowulf" paradigm of the cluster
being a single-headed virtual parallel supercomputer, where you would no
more login to a node than you would login to a specific processor in a
big SMP box.

I will echo Bill's suggestion as it is how we set up our clusters here
as well (they are primarily compute farms used to run many instances of
embarrassingly parallel code for e.g. Monte Carlo or nuclear theory
computations (generating slices of ongoing collision processes, for
example).

The engineering of the cluster is pretty simple:

   a) Server(s) provide(s) shared NFS mounts to nodes for users, DHCP
for nodes, NFS or FTP or HTTP export of e.g. RH distro and kickstart
files.

   b) Build kickstart file for "typical node".  I can give you one if
you need it that we use here.  We make the nodes relatively "fat",
since they have small local hard disks and "small" local hard disks are
currently so absurdly large that you could drop three or four completely
different OS installations on them and still have room for swap and
twenty GB of user scratch space.  In fact, you could easily install RH
AND scyld on the nodes and select which way you wanted to boot the
cluster at boot time.  It's just a matter of how you choose to
partition -- save 4 GB partition per boot.  The kickstart file specifies
how the node disk is to be laid out, packages to be installed, what (if
any) video support, and more, culminating in a post-install script that
can be run to "polish" the setup -- installing the appropriate
/etc/passwd, /etc/shadow, /etc/group, building /etc/fstab, and so forth.

   c) Set up the dhcpd.conf on the dhcp server.  Here is a typical node
entry for my "ganesh" cluster:

host g01 {
        hardware ethernet 00:01:03:BD:C5:7a;
        fixed-address 152.3.182.155;
        next-server install.phy.duke.edu;
        filename "/export/install/linux/rh-7.1/ks/beowulf";
        option domain-name "phy.duke.edu";
        option dhcp-class-identifier "PXEClient";
}

Note that this maps one MAC address to one IP number (in many cases one
would assign node addresses out of a private internal network space like
192.168.x.x -- these nodes for the time being are publically accessible
and secured like any other workstation).  One defines the name of the
server to be used by name or IP number.  Elsewhere there are global
definitions for things like NIS servers, nameservers, and the like, so
the booted host knows how to resolve the name.  filename gives the path
to the kickstart file that will then direct the install.  If one wishes
to provide it from a web or ftp server, prepend the appropriate http://.
The other options are local (and hopefully obvious in purpose).  This
particular node has PXE booting set up and can be installed by just
turning it on.  Without this, one probably needs a boot floppy from the
matching distribution and a floppy drive per node.

Once these things are set up, one merely boots the system.  If you use a
boot floppy, just enter "ks" at the boot prompt when requested OR cut a
custom boot floppy where ks is the default (I generally do this for
nodes without ks as it means that you don't need a monitor or keyboard
to reinstall).  Otherwise it is pretty much just turn it on.

On a good day, it will boot, find the dhcp server, get an IP number and
identity, and start building, loading, mounting "install" ramdisks as
fast as the network and server load permit.  (If PXE booting, it does
all this but in a somewhat different order as it has to get the boot
kernel over the network first).  It then rips through the kickstart
file's instructions (partition and format the disk, install swap, and
start installing packages).  When finished, it runs the post script,
which can end with instructions to reboot the newly installed node ready
for production.

On a good day, we can reinstall nodes in about 3-4 minutes.  In fact,
when I give folks a tour of our cluster, I generally include a reinstall
of a node just to show them how trivial it is.  We keep (or rather can
build dynamically on demand) a special "install" lilo.conf file on the
systems so that we can even reinstall them remotely from a script --
copy in the install lilo.conf, run lilo, reboot (system installs and
reboots into operational mode).  An impressive display of the
scalability of modern linux distributions, since exactly the same trick
will work for every workstation in an organization.  To manage a
network, one only needs to "work" on servers (as it should be).  nodes,
workstation clients, desktops, all of them should be complete kickstart
boilerplate with minimal customization all encapsulated in a (possibly
host specific) kickstart file.  If one crashes or becomes corrupt or is
cracked, a three or four minute reboot and it has a clean fresh install.

Regarding parallel computing support, of course your kickstart file can
contain e.g. MPI(s) of your prefered flavor(s), PVM, and so forth.  It
can also include at least the standard remote workstation management
tools, e.g. syslogng, and perhaps a few that are more cluster
management/monitoring tools although there is indeed yet a bit of a
dearth of these the mainstream linuces.  You have to decide whether you
are willing to live with these tools in order to have nodes that look
like remote access workstations or would prefer Scyld's paradigm of
nodes that look like multiple processors on a single system (with
matching "single system" management tools).

Or both.  Set it up to boot both ways on demand, and see which one works
better for you.  Neither one is particularly difficult to build and
configure, and the time you save making the truly correct decision for
your enterprise will likely pay for the time you spend figuring out the
truly correct decision to make.

    rgb

> 
> -Andrew
> 
> On Thu, 15 Nov 2001, William T. Rankin wrote:
> 
> > > From: "Andrew B. Raij" <raij@cs.unc.edu>
> > > 
> > > Hi everybody,
> > > 
> > > I'd like to set up a scyld cluster with slaves open to the public
> > > network.  I'd also like each slave to get the same ip of my choosing every
> > > time it is booted and slave ips shouldn't have to be confined to any
> > > specific range.  I understand that doing this is contradictory to the
> > > beowulf design but is it possible?  
> > 
> > What you are talking about is to set up all the nodes as general
> > purpose workstations and using them as a cluster.  This isn't
> > "contrary" to the beowulf design (that's how my first cluster was
> > set up).  It is contrary IIRC to the basic Scyld assumptions.
> > 
> > Have you considered just using kickstart with a standard linux
> > distribution to configure your machines?  Or is there something
> > specific to Scyld that you are interested in?
> > 
> > -bill
> > 
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From mcosta at fc.up.pt  Fri Nov 16 09:16:59 2001
From: mcosta at fc.up.pt (Miguel Costa)
Date: Wed Nov 25 01:01:52 2009
Subject: Rexec to Scyld nodes
Message-ID: <3BF54A0B.5020608@fc.up.pt>

Hello everybody,

I just got Scyld from Linux Central and tried it on a cluster of five 
dual athlon machines.
Everything works fine when I run MPI applications and I really like this 
"one big smp machine" design,
as opposed to a cluster of workstations, but I also need other people to 
be able to run
independent (non MPI) programs on the nodes.

How can these processes be started on the nodes (without the users 
including bprocs routines in their code)?

Can they use something like rexec or do I have to switch between Scyld 
Beowulf and Redhat Workstations
depending on what I want to do?

I already have Redhat installed in all the machines but it would be 
better if I didn't have to reboot between
"parallel multicomputer" and "computing farm".

hope this is clear,

regards,

Miguel Costa
University of Porto
Portugal


From eswardev at yahoo.com  Fri Nov 16 10:15:01 2001
From: eswardev at yahoo.com (Eswar Dev)
Date: Wed Nov 25 01:01:52 2009
Subject: Pgi  Atlas on Linux_Athlon
Message-ID: <20011116181501.82208.qmail@web14310.mail.yahoo.com>

Hi!
  I am getting bad results with PGI-ATLAS on LInux
Based ATHLON. 
  I need to get more speed up then what it shows hear.
Does any one had similar problems. Help needed
Thanks!!!!
-Eswarkumar
visit:http://atlantis.engr.odu.edu:8080
______________________________________________________
   
 This is for ./xsl3blastst
------------- GEMM ----------------------------------
TST# A B    M    N    K ALPHA  LDA  LDB  BETA  LDC 
TIME MFLOP SpUp  TEST
==== = = ==== ==== ==== ===== ==== ==== ===== ====
===== ==== =====
   0 N N  100  100  100   1.0 1000 1000   1.0 1000 
0.00   0.0 1.00 -----
   0 N N  100  100  100   1.0 1000 1000   1.0 1000 
0.00   0.0 0.00 PASS 
   1 N N  200  200  200   1.0 1000 1000   1.0 1000 
0.06 266.7 1.00 -----
   1 N N  200  200  200   1.0 1000 1000   1.0 1000 
0.06 266.7 1.00 PASS 
   2 N N  300  300  300   1.0 1000 1000   1.0 1000 
0.28 192.9 1.00 -----
   2 N N  300  300  300   1.0 1000 1000   1.0 1000 
0.21 257.1 1.33 PASS 
   3 N N  400  400  400   1.0 1000 1000   1.0 1000 
0.91 140.7 1.00 -----
   3 N N  400  400  400   1.0 1000 1000   1.0 1000 
0.51 251.0 1.78 PASS 
   4 N N  500  500  500   1.0 1000 1000   1.0 1000 
2.07 120.8 1.00 -----
   4 N N  500  500  500   1.0 1000 1000   1.0 1000 
0.99 252.5 2.09 PASS 
   5 N N  600  600  600   1.0 1000 1000   1.0 1000 
3.85 112.2 1.00 -----
   5 N N  600  600  600   1.0 1000 1000   1.0 1000 
1.69 255.6 2.28 PASS 
   6 N N  700  700  700   1.0 1000 1000   1.0 1000 
6.26 109.6 1.00 -----
   6 N N  700  700  700   1.0 1000 1000   1.0 1000 
2.70 254.1 2.32 PASS 
   7 N N  800  800  800   1.0 1000 1000   1.0 1000 
9.44 108.5 1.00 -----
   7 N N  800  800  800   1.0 1000 1000   1.0 1000 
4.03 254.1 2.34 PASS 
   8 N N  900  900  900   1.0 1000 1000   1.0 1000
13.47 108.2 1.00 -----
   8 N N  900  900  900   1.0 1000 1000   1.0 1000 
5.72 254.9 2.35 PASS 
   9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000
18.39 108.8 1.00 -----
   9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000 
7.87 254.1 2.34 PASS 

10 tests run, 10 passed

__________________________________________________
Do You Yahoo!?
Find the one for you at Yahoo! Personals
http://personals.yahoo.com

From agrajag at scyld.com  Fri Nov 16 19:23:57 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:52 2009
Subject: Rexec to Scyld nodes
In-Reply-To: <3BF54A0B.5020608@fc.up.pt>; from mcosta@fc.up.pt on Fri, Nov 16, 2001 at 05:16:59PM +0000
References: <3BF54A0B.5020608@fc.up.pt>
Message-ID: <20011116222357.A12504@blueraja.scyld.com>

On Fri, 16 Nov 2001, Miguel Costa wrote:

> How can these processes be started on the nodes (without the users 
> including bprocs routines in their code)?
> 
> Can they use something like rexec or do I have to switch between Scyld 
> Beowulf and Redhat Workstations
> depending on what I want to do?

Are you talking about rexec the program or the function call?

If you are talking about the program, you should be able to use bpsh
instead.  If its the function call, you should be able to use
bproc_execmove() instead.  This is a BProc function call, but it isn't
any more complicated than the rexec() function call.


Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011116/8e78ff1f/attachment.bin
From rajkumar at csse.monash.edu.au  Sat Nov 17 01:34:42 2001
From: rajkumar at csse.monash.edu.au (Rajkumar Buyya)
Date: Wed Nov 25 01:01:52 2009
Subject: Cluster books in Chinese
Message-ID: <3BF62F32.A2D9BA65@csse.monash.edu.au>

Dear All,

I am pleased to inform you that our two volumes book on: 
  High Performance Cluster Computing
  http://www.csse.monash.edu.au/~rajkumar/cluster/
published by Prentice Hall, USA (English version) are now
available in Chinese language:
  http://www.csse.monash.edu.au/~rajkumar/cluster/chinese/
published by Publishing House of Electronics Industry (PHEL), Beijing, China.

Hopefully, this helps in enhancing the adoption and usage of Cluster
Technologies in Chinese regions--I was told that the book is available in
China at affordable cost ($20 for both volumes).

-- 
Best regards,
Raj
PS: Both versions have a chapter on "Beowulf"!

------------------------------------------------------------------------
Rajkumar Buyya
School of Computer Science and Software Engineering
Monash University, C5.41, Caulfield Campus
Melbourne, VIC 3145, Australia
Phone: +61-3-9903 1969 (office); +61-3-9571 3629 (home)
Fax: +61-3-9903 2863; eFax: +1-801-720-9272
Email: rajkumar@buyya.com | rajkumar@csse.monash.edu.au
URL: http://www.buyya.com | http://www.csse.monash.edu.au/~rajkumar 
------------------------------------------------------------------------

From zadok at phreaker.net  Thu Nov  1 14:28:47 2001
From: zadok at phreaker.net (Hereward Cooper)
Date: Wed Nov 25 01:01:52 2009
Subject: [ot] Re: AMD testing
In-Reply-To: <Pine.LNX.4.33.0111011520150.7775-100000@caxton.startext.demon.co.uk>
References: <Pine.LNX.4.33.0111011520150.7775-100000@caxton.startext.demon.co.uk>
Message-ID: <20011101222847.1136f542.zadok@phreaker.net>

[i'm off list so please reply to me directly aswell as the list]

Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
use registered memory? Will it blow up :-) ??

Thanks,

Hereward


From SThomaso at phmining.com  Thu Nov  1 16:25:08 2001
From: SThomaso at phmining.com (Scott Thomason)
Date: Wed Nov 25 01:01:52 2009
Subject: Compile farm?
Message-ID: <sbe1938b.093@HCNA01>

Greetings. I'm interested in setting up a shell account/batch process/compile farm system for our developers, and I'm wondering if Beowulf clusters are well suited to that task. We're not interested in writing parallel code using PVM or MPI, we just want to log into what appears to be one big server and have it dispatch the workload amongst the slave processors. Is Beowulf good at that?
---scott


From ron_chen_123 at yahoo.com  Fri Nov  2 11:44:54 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:52 2009
Subject: Compile farm?
Message-ID: <20011102194454.28360.qmail@web14707.mail.yahoo.com>

What you need is a batch system.

There are 2 free batch systems, SGE and PBS.

Both of them are opensource, but nevertheless, you can
get 7x24 support if you are willing to pay.

PBS: www.openpbs.com
     www.pbspro.com

SGE: www.sun.com/gridware
     gridengine.sunsource.net

Also, SGE has qmake, which can execute several
instances of make on mutliple machines for one single
make job.

Install note:
http://supportforum.sun.com/gridengine/appnote_install.html

 -Ron

--- Scott Thomason <SThomaso@phmining.com> wrote:
> Greetings. I'm interested in setting up a shell
> account/batch process/compile farm system for our
> developers, and I'm wondering if Beowulf clusters
> are well suited to that task. We're not interested
> in writing parallel code using PVM or MPI, we just
> want to log into what appears to be one big server
> and have it dispatch the workload amongst the slave
> processors. Is Beowulf good at that?
> ---scott
> 
> p.s. Sorry if there are duplicates of this message;
> I used the wrong email address earlier.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From jhalpern at howard.edu  Fri Nov  2 18:30:33 2001
From: jhalpern at howard.edu (Joshua Halpern)
Date: Wed Nov 25 01:01:52 2009
Subject: Gigabit Ethernet switches and network adaptors.
References: <Pine.LNX.3.96.1011029131221.25395B-100000@Maggie.Linux-Consulting.com>
Message-ID: <3BE356C9.D5353B6B@howard.edu>

I am in the process of specifying components
for a small (8 node, Althon based) cluster.  In
searching the net I came across a  reasonably
priced 8 port copper Gigabit switch and
network adapters.

I have a friend who says that the things that he
makes are inexpensive, not cheap.  Does anyone
know whether the following are inexpensive,
or just cheap?  Any experience with them?

TrendNet TEG S80TX 8 port unmanaged switch - $799
http://www.trendware.com/products/TEG-S80TX.htm
Price: http://www.csocomputers.com/Hardware/Networking/Trendnet/Gigabit.htm
Review: http://www.8wire.com/articles/?aid=2300

TrendNet PCITX 32 bit PCI network adapter - $69
http://www.trendware.com/products/TEG-PCITX.htm

or the

Accton EN1408T 32 Bit PCI network adapter - $99
Review: http://www.8wire.com/articles/index.asp?AID=2212
http://www.8wire.com/articles/index.asp?AID=2280


From bnihan at angstrom.com  Thu Nov  8 06:20:36 2001
From: bnihan at angstrom.com (Bill Nihan)
Date: Wed Nov 25 01:01:52 2009
Subject: ExtremeNetworks Summit and channel bonding with Scyld
References: <1005145891.webexpressdV3.1.f@smtp.freesurf.ch>
Message-ID: <009901c16860$8b3770a0$a290443f@angstrom.com>

The Extreme Networks Summit 48 works quite well in a channel bonding
configuration.

I put together a 16 node cluster channel bonded at Atipa and don't recall
any problems at all.
Unfortuantly I did not save any netperf results but I remember they were in
line with discussions on the beowulf list.
We were no using Scyld at that time.

--Bill Nihan
Angstrom Microsystems
bnihan@angstrom.com


----- Original Message -----
From: "Javier Iglesias" <javier.iglesias@freesurf.ch>
To: <beowulf@beowulf.org>
Cc: <rmcgaugh@atipa.com>
Sent: Wednesday, November 07, 2001 10:11 AM
Subject: ExtremeNetworks Summit and channel bonding with Scyld


> Hi all,
>
> We are about to build a 19 bi-athlon/Tyan Tiger/Scyld cluster
> for academic research in the field of genetic programming, and
> large neural networks.
>
> We'd like to use an Extreme Networks Summit 48 ethernet switch
> -> http://www.extremenetworks.com/products/datasheets/summit24.asp
> connecting (highly recommended here recently :) Netgear FA310TX NICs
> -> http://www.netgear.com/product_view.asp?xrp=1&yrp=1&zrp=4
>
> Here come the questions :
> * has anyone experienced channel bonding on that switch ?
> * any Gigabit NIC recommandation for the master node ?
> * is it possible/necessary to channel bond Gigabit interfaces ?
>
> Thanks in advance for your help !!
>
> --javier
>
> --
> Kate Stevensen sagt: Meine Mission ist geheim! Finde es raus!
> http://www.sunrise.net/exclude/track/action.asp?PID_S=592&PID_T=593&LID=1
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From Conrad.Geiger at Sun.COM  Thu Nov  8 10:17:01 2001
From: Conrad.Geiger at Sun.COM (Conrad Geiger - Sun Academic Region HPC Technologist)
Date: Wed Nov 25 01:01:52 2009
Subject: SGE - Grid Engine Training at SC2001, November 12
In-Reply-To: "Your message with ID" <Roam2.0.6.1005239980.6300.cgeiger@rowlf.central.sun.com>
Message-ID: <Roam.SIMC.2.0.6.1005243421.31608.cgeiger@rowlf>

For those that are attending SC2001, there is a free
Grid Engine (SGE) training session available for you.
If you are interested in this open source Beowulf job
management system and would like to attend, please email
me and show up at the Denver location and time listed below:

   Class: SGE (Grid Engine) training
   Date: Monday, November 12
   Time: 1:00 p.m. - 4:00 p.m.
   Classroom location:  Colorado Ballroom F
             Marriott Hotel, 1701 California Street, Denver
               (near Denver Convention Center)

   AGENDA
   GRID ENGINE (SGE) TECHNICAL PRESENTATION:
 
         Sun Grid Engine (1 hour)
		 -- overview of concepts
		 -- installation options
		 -- architecture
                 -- information flow
                 -- scheduling
                 -- complexes and resource management
                 -- parallel and checkpointing
 
         Examples (30 minutes)
		 -- complexes
		 -- load sensor
                 -- license management
                 -- immediate vs. low priority jobs
 
         SGE/EE technology (15 minutes)
                 -- tickets
                 -- share tree, functional, deadline, override
 
         Grid Engine Integration with ClusterTools (20 minutes)
 
	 Grid Engine Open Source Project and API initiative
                          (20 minutes)

Conrad.Geiger@Sun.COM


>----------------Begin Forwarded Message----------------<

From: Ron Chen <ron_chen_123@yahoo.com>
Subject: Re: Compile farm?
To: Scott Thomason <SThomaso@phmining.com>, Beowulf@beowulf.org
Date: Fri, 2 Nov 2001 11:44:54 -0800 (PST)

What you need is a batch system.

There are 2 free batch systems, SGE and PBS.

Both of them are opensource, but nevertheless, you can
get 7x24 support if you are willing to pay.

PBS: www.openpbs.com
     www.pbspro.com

SGE: www.sun.com/gridware
     gridengine.sunsource.net

Also, SGE has qmake, which can execute several
instances of make on mutliple machines for one single
make job.

Install note:
http://supportforum.sun.com/gridengine/appnote_install.html

 -Ron

--- Scott Thomason <SThomaso@phmining.com> wrote:
> Greetings. I'm interested in setting up a shell
> account/batch process/compile farm system for our
> developers, and I'm wondering if Beowulf clusters
> are well suited to that task. We're not interested
> in writing parallel code using PVM or MPI, we just
> want to log into what appears to be one big server
> and have it dispatch the workload amongst the slave
> processors. Is Beowulf good at that?
> ---scott
> 
> p.s. Sorry if there are duplicates of this message;
> I used the wrong email address earlier.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 

>----------------End Forwarded Message----------------<


From dabige1 at yahoo.com  Fri Nov  9 09:20:10 2001
From: dabige1 at yahoo.com (Elie Bitton)
Date: Wed Nov 25 01:01:52 2009
Subject: Starfire on RedHat 7.2
Message-ID: <008301c16942$c9fe2a80$0216a8c0@ebitton>


Hi,
    I was trying to get my quad card (Adaptec ANA-62044) working under RedHat 7.2, and after RedHat did not automatically detect it, I found your site as pretty much the only source of information on this card. Anyway...a question..

When I do an insmod starfire with both the starfire.o that came with redhat 7.2 ( /lib/modules/2.4.7-10/kernel/drivers/net/starfire.o) and the one I compiled from your site (http://www.scyld.com) from the starfire.c (I compiled the pci-scan.c loaded pci-scan.o with insmod with no errors and I compiled both of these with the -I flag as mentioned on your site, but had to use slab.h instead of malloc.h Re the compiler's suggestion) it gives me the following error:

"starfire.o: init_module: No such device
Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters"

I don't know what base IO address the card is using or IRQ. Is there a way to find out (without having to install any form of windows in another partition)? 

I also compiled your starfire-diag tool, but again, can't test the card if I don't know the IO address.

Hoping you can help,

Regards,
    Elie.

I am not on the list...so please e-mail me back privately
dabige1@yahoo.com

Thanks.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011109/2bcad818/attachment.html
From okeefe at sistina.com  Sun Nov 11 10:45:26 2001
From: okeefe at sistina.com (Matt Okeefe)
Date: Wed Nov 25 01:01:52 2009
Subject: White Paper on Storage Clustering for Beowulfs
Message-ID: <20011111124526.A5938@sistina.com>


Hi,
there is a new white paper up on the Sistina web page entitled:                 

Accelerating Technical Computing with Sistina's GFS Based Storage Clusters

You can get this paper at our home page:  www.sistina.com.
It actually covers more than just technical computing, but that                 
is the focus.  

Feedback and criticism are of course welcome. 

Matt O'Keefe
CTO, Sistina Software, Inc.

From yoon at bh.kyungpook.ac.kr  Sun Nov 18 21:54:32 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Wed Nov 25 01:01:52 2009
Subject: Pgi  Atlas on Linux_Athlon
References: <20011116181501.82208.qmail@web14310.mail.yahoo.com>
Message-ID: <000a01c170be$b1c8ab60$5f72f2cb@LocalHost>

I found your test matrix size is very small. (The matrix size N  1000 x 1000).
That make you got not so good results with PGI-ATLAS on Linux Based ATHLON. 
It is not related to PGI compiler nor Athlon CPU or Network.
Will you increase your matrix size to N 3000 x 3000 or 5000 x 5000 if you have enough Local Memory(RAM) ?

Have a nice day !

Yoon Jae Ho, Seoul,Korea
 

----- Original Message ----- 
From: Eswar Dev <eswardev@yahoo.com>
To: <beowulf@beowulf.org>
Sent: Saturday, November 17, 2001 3:15 AM
Subject: Pgi Atlas on Linux_Athlon


> Hi!
>   I am getting bad results with PGI-ATLAS on LInux
> Based ATHLON. 
>   I need to get more speed up then what it shows hear.
> Does any one had similar problems. Help needed
> Thanks!!!!
> -Eswarkumar
> visit:http://atlantis.engr.odu.edu:8080
> ______________________________________________________
>    
>  This is for ./xsl3blastst
> ------------- GEMM ----------------------------------
> TST# A B    M    N    K ALPHA  LDA  LDB  BETA  LDC 
> TIME MFLOP SpUp  TEST
> ==== = = ==== ==== ==== ===== ==== ==== ===== ====
> ===== ==== =====
>    0 N N  100  100  100   1.0 1000 1000   1.0 1000 
> 0.00   0.0 1.00 -----
>    0 N N  100  100  100   1.0 1000 1000   1.0 1000 
> 0.00   0.0 0.00 PASS 
>    1 N N  200  200  200   1.0 1000 1000   1.0 1000 
> 0.06 266.7 1.00 -----
>    1 N N  200  200  200   1.0 1000 1000   1.0 1000 
> 0.06 266.7 1.00 PASS 
>    2 N N  300  300  300   1.0 1000 1000   1.0 1000 
> 0.28 192.9 1.00 -----
>    2 N N  300  300  300   1.0 1000 1000   1.0 1000 
> 0.21 257.1 1.33 PASS 
>    3 N N  400  400  400   1.0 1000 1000   1.0 1000 
> 0.91 140.7 1.00 -----
>    3 N N  400  400  400   1.0 1000 1000   1.0 1000 
> 0.51 251.0 1.78 PASS 
>    4 N N  500  500  500   1.0 1000 1000   1.0 1000 
> 2.07 120.8 1.00 -----
>    4 N N  500  500  500   1.0 1000 1000   1.0 1000 
> 0.99 252.5 2.09 PASS 
>    5 N N  600  600  600   1.0 1000 1000   1.0 1000 
> 3.85 112.2 1.00 -----
>    5 N N  600  600  600   1.0 1000 1000   1.0 1000 
> 1.69 255.6 2.28 PASS 
>    6 N N  700  700  700   1.0 1000 1000   1.0 1000 
> 6.26 109.6 1.00 -----
>    6 N N  700  700  700   1.0 1000 1000   1.0 1000 
> 2.70 254.1 2.32 PASS 
>    7 N N  800  800  800   1.0 1000 1000   1.0 1000 
> 9.44 108.5 1.00 -----
>    7 N N  800  800  800   1.0 1000 1000   1.0 1000 
> 4.03 254.1 2.34 PASS 
>    8 N N  900  900  900   1.0 1000 1000   1.0 1000
> 13.47 108.2 1.00 -----
>    8 N N  900  900  900   1.0 1000 1000   1.0 1000 
> 5.72 254.9 2.35 PASS 
>    9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000
> 18.39 108.8 1.00 -----
>    9 N N 1000 1000 1000   1.0 1000 1000   1.0 1000 
> 7.87 254.1 2.34 PASS 
> 
> 10 tests run, 10 passed
> 

---------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon@bh.kyungpook.ac.kr
jhyoon@mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D     Help the people with your PC 
 
Imagination is more important than knowledge.  A. Einstein
"??????? ??? ???" ??? ??, " ??? ??? ??" ?? ??? ??(???? ???? ???? ??)
"????? '???? ????? ??'??? ? ? ???, ??? ??? ????? ??? ????."   ?? ???   
"??? ?? ??? ??? ??? ??? ??? ??"  ??? 2000.4.22
"???? ???? ?? ??? ??? ??? ????" ? ?? 2000.4.29
"???? ??? ??? ??? ??? ????" ? ?? 2000.4.24
http://www.kichun.co.kr   2001.1.6
http://www.c3tv.com    2001.1.10 

------------------------------------------------------------------------


From ssy at prg.cpe.ku.ac.th  Mon Nov 19 00:05:31 2001
From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul)
Date: Wed Nov 25 01:01:52 2009
Subject: About SCE - Re: Comparison of clustering systems?
Message-ID: <Pine.LNX.4.30.0111191504590.18727-100000@psi.cpe.ku.ac.th>

	You can get the overview of SCE distribution at
http://www.opensce.org/doc/compaq.pdf.

Somsak

----- Original Message -----
From: "Peter Lindgren" <Peter.Lindgren@experian.com>
To: <beowulf@beowulf.org>
Sent: Friday, November 16, 2001 2:54 AM
Subject: Comparison of clustering systems?


> Top500 has a poll asking "what cluster system do you use?". They offer the
choices:
>
> Oscar
> Scyld
> Score
> SCE
> NPACI Rocks
> MSC.Linux
> Other
>
> Follow this link to see current results:
> http://clusters.top500.org/pollbooth.php?qid=clustersys&aid=-1
>
> So far I've actually installed and gotten my application running with
Scyld. I have most of the other systems on CD. I was about to try Rocks. But
to try them all is going to take me a long time....
>
> I wonder whether anyone has recently done a comparison of these (or other)
systems? (I also know of  Cplant and IBM's CSK.) I found an NHSE article
from 1996, but that's ancient history.
>
>
>
> Peter Lindgren
> Phone: 847 944 4515
> Fax: 847 517 5889
> E-mail: peter.lindgren@experian.com
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
---------------------------------------------
Somsak Sriprayoonsakul <ssy@prg.cpe.ku.ac.th>
Parallel Research Group
http://prg.cpe.ku.ac.th
---------------------------------------------


From SGaudet at turbotekcomputer.com  Mon Nov 19 07:00:46 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:52 2009
Subject: Gigabit Ethernet switches and network adaptors.
Message-ID: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO>

Hello Joshua,

> I am in the process of specifying components
> for a small (8 node, Althon based) cluster.  In
> searching the net I came across a  reasonably
> priced 8 port copper Gigabit switch and
> network adapters.
> 
> I have a friend who says that the things that he
> makes are inexpensive, not cheap.  Does anyone
> know whether the following are inexpensive,
> or just cheap?  Any experience with them?
> 
> TrendNet TEG S80TX 8 port unmanaged switch - $799
> http://www.trendware.com/products/TEG-S80TX.htm
> Price: 
> http://www.csocomputers.com/Hardware/Networking/Trendnet/Gigabit.htm
> Review: http://www.8wire.com/articles/?aid=2300
> 
> TrendNet PCITX 32 bit PCI network adapter - $69
> http://www.trendware.com/products/TEG-PCITX.htm
> 
> or the
> 
> Accton EN1408T 32 Bit PCI network adapter - $99
> Review: http://www.8wire.com/articles/index.asp?AID=2212
> http://www.8wire.com/articles/index.asp?AID=2280


Here's some of the network GIG E hardware we'd like to recommend:

  AceNIC/NetGear GA620(T)/3C985B
  SysKonnect
  NS chipset:
     Cameo           SOHO-GA2000T    SOHO-GA2500T
     D-Link          DGE-500T
     PureData        PDP8023Z-TG
     SMC             SMC9462TX
     NetGear         GA622

=============================================================

For switches you might want to also look into top lines like HP, 3Com and
Cisco...because they offer government, educational, and medical
discounts(GEM).  There discounts are very agressive.  Cisco will also take
trade-ins on old equipment.

Cheers,


Steve Gaudet 
Linux Sales Engineer
   ..... 
  <(???)> 
 
===================================================================
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet@turbotekcomputer.com  |
|                            web: http://www.turbotekcomputer.com |
===================================================================

  
From lmeerkat at yahoo.com  Mon Nov 19 09:47:17 2001
From: lmeerkat at yahoo.com (L G)
Date: Wed Nov 25 01:01:52 2009
Subject: Bootload problems.
Message-ID: <20011119174717.34881.qmail@web20604.mail.yahoo.com>

Hi,

I'm building a Beowulf cluster and come across the
problem on one of my
machines. It is a PIII-133 with 10GB hard drive, 256
MB. It has following
partitions:

hda1 - 1 1 Linux

hda2 - 2 1247 Extended

hda5 - 2 1181 Linux

hda6 -1182 1247 Linux swap.

I'm trying to boot this node from a boot floppy disk.
When a first step is
completely done, then all of a sudden Linux boot
starts to work. I can't
figure out what's going on.

I tried to boot it is with another order of partitions
in a partition table,
it was as follows:

hda1 - 1 1 Linux

hda2 - 2 1247 Extended

hda5 - 2 67 Linux swap

hda6 - 68 1247 Linux

and I received another error which was "Cannot open
root device 03:05",
after that the system just started to reboot. The same
kind of partition
table I have on my another machine works ok.

I tried to boot the node in question without any hard
drive at all, but it
didn't work either. It was still looking for root
partition.

Could you help me to solve this problem, please?

Thanks.
Lyudmila Gritsenko
Software Developer Absoft Corp.

=====
Best regards,
Meerkat.

__________________________________________________
Do You Yahoo!?
Find the one for you at Yahoo! Personals
http://personals.yahoo.com

From kinghorn at pqs-chem.com  Mon Nov 19 10:03:35 2001
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Wed Nov 25 01:01:52 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
Message-ID: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net>

...
We'll it won't blow up with a kaboom ... According to the Mushkin web site 
you can use 2 non-registered modules on the tiger 2460 ...

Now, I'll give my opinion. This board is poorly enginered and has been a 
pain make suitable for scientific computing. I've had numerous memory 
problems with these boards and most of the problems don't show up as obvious 
crashes. They are the worst kind of errors -- corrupted results for large 
jobs --- the kind of thing you might not catch without carefull testing. I 
have found that I can not reliably use more than 3 memory modules (of any 
type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB 
MODULES. Try 2 512MB or 2 256 and 1 512 module.

I have a cluster running reliably with no detectable errors under any load 
(FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg 
ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You 
need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential), 
use a recent (>2.4.11) kernel with a stable vm, use an append  noapic in 
lilo and do NOT use any mem= appends in lilo. All of these steps may not be 
simultaneously needed but I am sick of fighting with this motherboard and 
this configuration seems to work reliably. (your milage may vary)

I'm looking forward to seeing some better dual athlon boards come to the 
market. Does any one have any info on when this may happen?

Best regards
-Don 

>[i'm off list so please reply to me directly aswell as the list]

>Has any user of the Tiger MP S2460 had experience of what happens if you 
>DON'T
>use registered memory? Will it blow up :-) ??


From Mark at MarkAndrewSmith.co.uk  Mon Nov 19 10:11:55 2001
From: Mark at MarkAndrewSmith.co.uk (Mark@MarkAndrewSmith.co.uk)
Date: Wed Nov 25 01:01:53 2009
Subject: Compile farm?
Message-ID: <61DC272A66B8D211BA8200105ADF2D3910E6FF@SERVER01>

 
Scott, 
 
.... or if you don't need batch but would like some interaction without the
use of any special libraries, you might like to have a little look at what
the MOSIX guys are doing at http://www.mosix.org/ or
http://www.mosix.cs.huji.ac.il/  (I'm not sure but I think their website is
down at the moment).  I am currently building a dev box for general use and
spreading compiles across nodes seems a good idea since you see the system
as one big box.  Then there is the added bonus you don't need to recompile
any code....! 
 
Regards, 
 
 
Mark. 
Tel: (01942)722518 
Mob: (07866)070122 
 
-----Original Message----- 
From:		Ron Chen [SMTP:ron_chen_123@yahoo.com] 
Sent:		Monday 19 November 2001 07:00 
To:		Scott Thomason; Beowulf@beowulf.org 
Subject:	Re: Compile farm? 
 
What you need is a batch system. 
 
There are 2 free batch systems, SGE and PBS. 
 
Both of them are opensource, but nevertheless, you  
can 
get 7x24 support if you are willing to pay. 
 
PBS: www.openpbs.com 
     www.pbspro.com 
 
SGE: www.sun.com/gridware 
     gridengine.sunsource.net 
 
Also,  
SGE has qmake, which can execute several 
instances of make on mutliple machines for one single 
make job. 
 
Install note: 
http://supportforum.sun.com/gridengine/appnote_install.html 
 
  
-Ron 
 
--- Scott Thomason <SThomaso@phmining.com> wrote: 
> Greetings. I'm interested in setting up a shell 
> account/batch process/compile  
farm system for our 
> developers, and I'm wondering if Beowulf clusters 
> are well suited to that task. We're not interested 
> in  
writing parallel code using PVM or MPI, we just 
> want to log into what appears to be one big server 
> and have it dispatch the workload  
amongst the slave 
> processors. Is Beowulf good at that? 
> ---scott 
>  
> p.s. Sorry if there are duplicates of this message; 
>  
I used the wrong email address earlier. 
>  
> _______________________________________________ 
> Beowulf mailing list, Beowulf@beowulf.org 
>  
To change your subscription (digest mode or 
> unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf 
 
 
__________________________________________________ 
Do  
You Yahoo!? 
Find a job, post your resume. 
http://careers.yahoo.com 
_______________________________________________ 
Beowulf mailing  
list, Beowulf@beowulf.org 
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 
_______________________________________________ 
Beowulf  
mailing list, Beowulf@beowulf.org 
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011119/64e93fd1/attachment.html
From math at velocet.ca  Mon Nov 19 17:14:18 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO>; from SGaudet@turbotekcomputer.com on Mon, Nov 19, 2001 at 10:00:46AM -0500
References: <3450CC8673CFD411A24700105A618BD6170F69@911TURBO>
Message-ID: <20011119201418.L66460@velocet.ca>

On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all...
> Here's some of the network GIG E hardware we'd like to recommend:
> 
>   AceNIC/NetGear GA620(T)/3C985B
>   SysKonnect
>   NS chipset:
>      Cameo           SOHO-GA2000T    SOHO-GA2500T
>      D-Link          DGE-500T
>      PureData        PDP8023Z-TG
>      SMC             SMC9462TX
>      NetGear         GA622


How do you find the performance of these NS82830 cards? Do they do
block interupt xfer or whatever it is for more efficient xfer? How much
system/interupt time do they chew up?

/kc

From wsb at paralleldata.com  Mon Nov 19 19:28:01 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net>
Message-ID: <3BF9CDC1.EC92838A@paralleldata.com>

I must have gotten lucky. I built a test system, installed RH7.2 on it,
and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's
why I didn't have so much trouble. Also tried MP vs XP processors and both
perform identically with this board. It's the fastest Athlon system I've tested.

Wes

"Donald B. Kinghorn" wrote:
> 
> ...
> We'll it won't blow up with a kaboom ... According to the Mushkin web site
> you can use 2 non-registered modules on the tiger 2460 ...
> 
> Now, I'll give my opinion. This board is poorly enginered and has been a
> pain make suitable for scientific computing. I've had numerous memory
> problems with these boards and most of the problems don't show up as obvious
> crashes. They are the worst kind of errors -- corrupted results for large
> jobs --- the kind of thing you might not catch without carefull testing. I
> have found that I can not reliably use more than 3 memory modules (of any
> type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB
> MODULES. Try 2 512MB or 2 256 and 1 512 module.
> 
> I have a cluster running reliably with no detectable errors under any load
> (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg
> ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You
> need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential),
> use a recent (>2.4.11) kernel with a stable vm, use an append  noapic in
> lilo and do NOT use any mem= appends in lilo. All of these steps may not be
> simultaneously needed but I am sick of fighting with this motherboard and
> this configuration seems to work reliably. (your milage may vary)
> 
> I'm looking forward to seeing some better dual athlon boards come to the
> market. Does any one have any info on when this may happen?
> 
> Best regards
> -Don
> 
> >[i'm off list so please reply to me directly aswell as the list]
> 
> >Has any user of the Tiger MP S2460 had experience of what happens if you
> >DON'T
> >use registered memory? Will it blow up :-) ??
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From lindahl at conservativecomputer.com  Mon Nov 19 20:17:48 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
In-Reply-To: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net>; from kinghorn@pqs-chem.com on Mon, Nov 19, 2001 at 12:03:35PM -0600
References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net>
Message-ID: <20011119231748.A2068@wumpus.foo>

On Mon, Nov 19, 2001 at 12:03:35PM -0600, Donald B. Kinghorn wrote:

> I have a cluster running reliably with no detectable errors under any load 
> (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg 
> ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board.

One thing that can kill you with memory is mixing dissimilar memory.

greg

From math at velocet.ca  Mon Nov 19 20:20:17 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
In-Reply-To: <3BF9CDC1.EC92838A@paralleldata.com>; from wsb@paralleldata.com on Mon, Nov 19, 2001 at 09:28:01PM -0600
References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com>
Message-ID: <20011119232016.C89961@velocet.ca>

On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all...
> 
> I must have gotten lucky. I built a test system, installed RH7.2 on it,
> and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's
> why I didn't have so much trouble. Also tried MP vs XP processors and both
> perform identically with this board. It's the fastest Athlon system I've tested.
> 
> Wes

Have you tested your computations for subtle errors on it, compared to
with the MP processors? As said below there can be errors that
are undetectable - how likely is this?

/kc

> 
> "Donald B. Kinghorn" wrote:
> > 
> > ...
> > We'll it won't blow up with a kaboom ... According to the Mushkin web site
> > you can use 2 non-registered modules on the tiger 2460 ...
> > 
> > Now, I'll give my opinion. This board is poorly enginered and has been a
> > pain make suitable for scientific computing. I've had numerous memory
> > problems with these boards and most of the problems don't show up as obvious
> > crashes. They are the worst kind of errors -- corrupted results for large
> > jobs --- the kind of thing you might not catch without carefull testing. I
> > have found that I can not reliably use more than 3 memory modules (of any
> > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB
> > MODULES. Try 2 512MB or 2 256 and 1 512 module.
> > 
> > I have a cluster running reliably with no detectable errors under any load
> > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg
> > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You
> > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential),
> > use a recent (>2.4.11) kernel with a stable vm, use an append  noapic in
> > lilo and do NOT use any mem= appends in lilo. All of these steps may not be
> > simultaneously needed but I am sick of fighting with this motherboard and
> > this configuration seems to work reliably. (your milage may vary)
> > 
> > I'm looking forward to seeing some better dual athlon boards come to the
> > market. Does any one have any info on when this may happen?
> > 
> > Best regards
> > -Don
> > 
> > >[i'm off list so please reply to me directly aswell as the list]
> > 
> > >Has any user of the Tiger MP S2460 had experience of what happens if you
> > >DON'T
> > >use registered memory? Will it blow up :-) ??
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From wsb at paralleldata.com  Mon Nov 19 20:39:07 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com> <20011119232016.C89961@velocet.ca>
Message-ID: <3BF9DE6B.E05B29A1@paralleldata.com>

Well, I always scan my results for the min/max values and so far
they've been identical. Haven't actually subtracted the results
to see if it changed. I'm picking up a couple more boards and
should have them running next week. I can do more testing at that
point. It's a pain to swap cpus at the moment.

Are you sure you don't have a heat problem? What sort of case are
you using? Maybe you're heatsink/fan are insufficient for your 
particular chip? There are some very high volume fan/HS combos
available for Athlons. I'm using a 5400rpm fan/HS at the moment but
have a couple 8000rpm fan/HS available if needed. They're awful
noisy though and are a last resort.

Wes

Velocet wrote:
> 
> On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all...
> >
> > I must have gotten lucky. I built a test system, installed RH7.2 on it,
> > and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's
> > why I didn't have so much trouble. Also tried MP vs XP processors and both
> > perform identically with this board. It's the fastest Athlon system I've tested.
> >
> > Wes
> 
> Have you tested your computations for subtle errors on it, compared to
> with the MP processors? As said below there can be errors that
> are undetectable - how likely is this?
> 
> /kc
> 
> >
> > "Donald B. Kinghorn" wrote:
> > >
> > > ...
> > > We'll it won't blow up with a kaboom ... According to the Mushkin web site
> > > you can use 2 non-registered modules on the tiger 2460 ...
> > >
> > > Now, I'll give my opinion. This board is poorly enginered and has been a
> > > pain make suitable for scientific computing. I've had numerous memory
> > > problems with these boards and most of the problems don't show up as obvious
> > > crashes. They are the worst kind of errors -- corrupted results for large
> > > jobs --- the kind of thing you might not catch without carefull testing. I
> > > have found that I can not reliably use more than 3 memory modules (of any
> > > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB
> > > MODULES. Try 2 512MB or 2 256 and 1 512 module.
> > >
> > > I have a cluster running reliably with no detectable errors under any load
> > > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg
> > > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You
> > > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential),
> > > use a recent (>2.4.11) kernel with a stable vm, use an append  noapic in
> > > lilo and do NOT use any mem= appends in lilo. All of these steps may not be
> > > simultaneously needed but I am sick of fighting with this motherboard and
> > > this configuration seems to work reliably. (your milage may vary)
> > >
> > > I'm looking forward to seeing some better dual athlon boards come to the
> > > market. Does any one have any info on when this may happen?
> > >
> > > Best regards
> > > -Don
> > >
> > > >[i'm off list so please reply to me directly aswell as the list]
> > >
> > > >Has any user of the Tiger MP S2460 had experience of what happens if you
> > > >DON'T
> > > >use registered memory? Will it blow up :-) ??
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf@beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> --
> Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA

From math at velocet.ca  Mon Nov 19 21:12:11 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
In-Reply-To: <3BF9DE6B.E05B29A1@paralleldata.com>; from wsb@paralleldata.com on Mon, Nov 19, 2001 at 10:39:07PM -0600
References: <0GN200EYH7GYKX@mta4.rcsntx.swbell.net> <3BF9CDC1.EC92838A@paralleldata.com> <20011119232016.C89961@velocet.ca> <3BF9DE6B.E05B29A1@paralleldata.com>
Message-ID: <20011120001211.D89961@velocet.ca>

On Mon, Nov 19, 2001 at 10:39:07PM -0600, W Bauske's all...
> 
> Well, I always scan my results for the min/max values and so far
> they've been identical. Haven't actually subtracted the results
> to see if it changed. I'm picking up a couple more boards and
> should have them running next week. I can do more testing at that
> point. It's a pain to swap cpus at the moment.
> 
> Are you sure you don't have a heat problem? What sort of case are
> you using? Maybe you're heatsink/fan are insufficient for your 
> particular chip? There are some very high volume fan/HS combos
> available for Athlons. I'm using a 5400rpm fan/HS at the moment but
> have a couple 8000rpm fan/HS available if needed. They're awful
> noisy though and are a last resort.

I've had no problems personally. I was just trying to get a bead on
what kind of problems others have had. So far for you and me they've
worked perfectly, but I spent the extra $5 and got Registered
ECC DDR ram (from Crucial its a great price, why not).

They do generate alot of heat (2 cpus) and I have 5400 RPM fans but
they're in a cabinet with alot of airflow (not sure how many CFM, but 
alot of it, at 66F).

/kc

> 
> Wes
> 
> Velocet wrote:
> > 
> > On Mon, Nov 19, 2001 at 09:28:01PM -0600, W Bauske's all...
> > >
> > > I must have gotten lucky. I built a test system, installed RH7.2 on it,
> > > and it works fine. I only put 2 512MB ECC DIMM's in it though so perhaps that's
> > > why I didn't have so much trouble. Also tried MP vs XP processors and both
> > > perform identically with this board. It's the fastest Athlon system I've tested.
> > >
> > > Wes
> > 
> > Have you tested your computations for subtle errors on it, compared to
> > with the MP processors? As said below there can be errors that
> > are undetectable - how likely is this?
> > 
> > /kc
> > 
> > >
> > > "Donald B. Kinghorn" wrote:
> > > >
> > > > ...
> > > > We'll it won't blow up with a kaboom ... According to the Mushkin web site
> > > > you can use 2 non-registered modules on the tiger 2460 ...
> > > >
> > > > Now, I'll give my opinion. This board is poorly enginered and has been a
> > > > pain make suitable for scientific computing. I've had numerous memory
> > > > problems with these boards and most of the problems don't show up as obvious
> > > > crashes. They are the worst kind of errors -- corrupted results for large
> > > > jobs --- the kind of thing you might not catch without carefull testing. I
> > > > have found that I can not reliably use more than 3 memory modules (of any
> > > > type) on these boards. If you need a 1GB configuration DON'T USE 4 256MB
> > > > MODULES. Try 2 512MB or 2 256 and 1 512 module.
> > > >
> > > > I have a cluster running reliably with no detectable errors under any load
> > > > (FINALLY) using the tiger 2460 with the 1.03 bios using 2 256 Crucial reg
> > > > ecc modules and 1 Infenion (64x4) reg ecc 512MB module on each board. You
> > > > need to ENABLE quick boot in the bios, use ECC SCRUB (maybe not essential),
> > > > use a recent (>2.4.11) kernel with a stable vm, use an append  noapic in
> > > > lilo and do NOT use any mem= appends in lilo. All of these steps may not be
> > > > simultaneously needed but I am sick of fighting with this motherboard and
> > > > this configuration seems to work reliably. (your milage may vary)
> > > >
> > > > I'm looking forward to seeing some better dual athlon boards come to the
> > > > market. Does any one have any info on when this may happen?
> > > >
> > > > Best regards
> > > > -Don
> > > >
> > > > >[i'm off list so please reply to me directly aswell as the list]
> > > >
> > > > >Has any user of the Tiger MP S2460 had experience of what happens if you
> > > > >DON'T
> > > > >use registered memory? Will it blow up :-) ??
> > > >
> > > > _______________________________________________
> > > > Beowulf mailing list, Beowulf@beowulf.org
> > > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf@beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > --
> > Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From jakob at unthought.net  Mon Nov 19 22:03:39 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed Nov 25 01:01:53 2009
Subject: [ot] Re: AMD testing
In-Reply-To: <20011101222847.1136f542.zadok@phreaker.net>; from zadok@phreaker.net on Thu, Nov 01, 2001 at 10:28:47PM +0000
References: <Pine.LNX.4.33.0111011520150.7775-100000@caxton.startext.demon.co.uk> <20011101222847.1136f542.zadok@phreaker.net>
Message-ID: <20011120070339.M9896@unthought.net>

On Thu, Nov 01, 2001 at 10:28:47PM +0000, Hereward Cooper wrote:
> [i'm off list so please reply to me directly aswell as the list]
> 
> Has any user of the Tiger MP S2460 had experience of what happens if you DON'T
> use registered memory? Will it blow up :-) ??

It should work if you only use two memory modules (max).

If you need more than two modules, you need registered memory (for all
modules).   Apparently the chipset can't drive more than two unregistered
blocks.

However, on the Tiger here I couldn't use two unregistered modules at all.  The
board made some sequence of "beeps" at power-on, wouldn't even POST.  Using
registered modules instead solved the problem.  (Mmmmm.... 2x2785 BogoMIPS)

Mine didn't blow up. But that's hardly proof that it can't happen,
so let us know if yours blow up    8)

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

From mcosta at fc.up.pt  Tue Nov 20 03:04:16 2001
From: mcosta at fc.up.pt (Miguel Costa)
Date: Wed Nov 25 01:01:53 2009
Subject: MPE parallel graphics
Message-ID: <3BFA38B0.3080406@fc.up.pt>

Hello again,

after finding enlightenment (no pun intended) on bpsh on my first post, 
I return to seek your help on a different topic:

On Scyld, when I use MPICH's mpicc with the flag -mpianim and then run 
the program, it displays a window with dots representing the cpus but 
then the nodes don't seem to be able to communicate with the master's X 
server and it crashes.

Anyone had this problem or can see why this is happening?

Thanks again

regards,
miguel costa


From SGaudet at turbotekcomputer.com  Tue Nov 20 08:06:29 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
Message-ID: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>

Hello,

> On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all...
> > Here's some of the network GIG E hardware we'd like to recommend:
> > 
> >   AceNIC/NetGear GA620(T)/3C985B
> >   SysKonnect
> >   NS chipset:
> >      Cameo           SOHO-GA2000T    SOHO-GA2500T
> >      D-Link          DGE-500T
> >      PureData        PDP8023Z-TG
> >      SMC             SMC9462TX
> >      NetGear         GA622
> 
> 
> How do you find the performance of these NS82830 cards? Do they do
> block interupt xfer or whatever it is for more efficient 
> xfer? How much
> system/interupt time do they chew up?

The NS based stuff is low-end and cheap - you get what you pay for - but the
drivers are rock solid and provide a good way to get started with GigE.

On a high enough powered box, you might even get decent throughput but it's
at a
cost of cycles.

The SysKonnect card I'm told, is going EOL so good news/bad news, they may
start
showing up on E-bay sometime. ;)

The most promising stuff is a broadcomm chip based 3com card (3c996) but as
of
this moment, there's no Linux driver for it.  That'll be the stuff to buy
for
next year, though.


Cheers,


Steve Gaudet 
Linux Sales Engineer
   ..... 
  <(???)> 
 
===================================================================
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet@turbotekcomputer.com  |
|                            web: http://www.turbotekcomputer.com |
===================================================================

  
From ssy at prg.cpe.ku.ac.th  Tue Nov 20 08:43:04 2001
From: ssy at prg.cpe.ku.ac.th (Somsak Sriprayoonsakul)
Date: Wed Nov 25 01:01:53 2009
Subject: Compile farm?
Message-ID: <BBEHKKFHNAFLBNGDEAJIIECPCFAA.ssy@prg.cpe.ku.ac.th>

    There is a program called 'ppmake' which uses a combination of make -j and PVM to distribute compilation thread to each node in cluster. You can looking for it at rpmfind.net or google.com (It used to be at http://www3.informatik.tu-muenchen.de/~zimmerms/ppmake/ but the link is down). You might use it with some Batch Scheduling Systems that support PVM.


-------------------------------------------------------------------
Somsak Sriprayoonsakul
Parallel Research Group
Kasetsart University
ssy@prg.cpe.ku.ac.th
-------------------------------------------------------------------

From math at velocet.ca  Tue Nov 20 08:46:02 2001
From: math at velocet.ca ('math@velocet.ca')
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>; from SGaudet@turbotekcomputer.com on Tue, Nov 20, 2001 at 11:06:29AM -0500
References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>
Message-ID: <20011120114601.K89961@velocet.ca>

> The NS based stuff is low-end and cheap - you get what you pay for - but the
> drivers are rock solid and provide a good way to get started with GigE.
> 
> On a high enough powered box, you might even get decent throughput but it's
> at a cost of cycles.


Here's an interesting question: even with the fastest network interconnects
(SCALI, etc), we dont see 100% scaling at large numbers of nodes. There
is some free CPU left over. So what if you had slightly less efficient
equipment? I realise it would cause a slowdown for sending out messages
as the latency may be increased, but if the latency is the same as 
for the high end network equipment and only costs more cycles, is it
conceivable that the scaling and performance of this cluster with slightly
less efficient equipment would be similar? (Again, there's a big assumption
here that we can find such equipment that has the same latency when
extra cycles are involved, which may be the source of much latency for
many cards in the first place).

/kc

From lmeerkat at yahoo.com  Tue Nov 20 09:28:53 2001
From: lmeerkat at yahoo.com (L G)
Date: Wed Nov 25 01:01:53 2009
Subject: Beofulf Status Monitor information
Message-ID: <20011120172853.96241.qmail@web20606.mail.yahoo.com>

Hi,

I can't see any information in the Scyld Beofulf
Status Monitor for one of my node, it says Memory -
0%, Swap - None, Disk - 0%,  Network - 0 kBps. It is
Pentium with 64Mb and 2GB HD. This node is in the
state "up" and I do have full access to this node from
master machine.
I think the problem is that it only has 64Mb of RAM,
am I right?
 
Thanks.
Lyudmila Gritsenko


=====


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From Nirmal.Bissonauth at durham.ac.uk  Tue Nov 20 09:38:16 2001
From: Nirmal.Bissonauth at durham.ac.uk (Nirmal Bissonauth)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
In-Reply-To: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>
Message-ID: <Pine.GSO.3.95-960729.1011120172639.19086A-100000@altair.dur.ac.uk>

On Tue, 20 Nov 2001, Steve Gaudet wrote:

> The most promising stuff is a broadcomm chip based 3com card (3c996) but as
> of
> this moment, there's no Linux driver for it.  That'll be the stuff to buy
> for
> next year, though.
> 
> 
> Cheers,
> 
> 
> Steve Gaudet 
> Linux Sales Engineer
>    ..... 

Actually 3com has some linux drivers for the 3c996 card on their website.
If anybody tries them, let us know if they are any good?

http://www.3com.com/products/en_US/result.jsp?selected=6&sort=effdt&sku=3C996-T&order=desc
Hope that was pasted ok.

Cheers
Nirmal Bissonauth


From okeefe at sistina.com  Tue Nov 20 10:46:57 2001
From: okeefe at sistina.com (Matt Okeefe)
Date: Wed Nov 25 01:01:53 2009
Subject: Compile farm?
In-Reply-To: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com>
References: <313680C9A886D511A06000204840E1CF3F0F3B@whq-msgusr-02.pit.comms.marconi.com>
Message-ID: <20011120124657.A6703@sistina.com>

On Wed, Nov 14, 2001 at 08:58:26AM -0500, Strange, John wrote:
> Well you can use mexec with mosix to get things to work, and it does work
> quite well but it doesn't scale because of some underlying filesystem
> problems we are having.
> 
> I've got 25 machines, our backend storage currently is netapp filers, so
> using NFS I have to turn off client side caching.  It basically crushes the
> filer doing constant file handling lookups.  I'm still playing with a netapp
> that we have on spare, maybe I'll have some luck with finding away around
> the problems that we are having.
> 
> There is no really good backend filesystem that you can use, maybe GFS but
> it's still relatively new and too bleeding edge for pratical use. (IMHO)

Actually there are a fair number of people using it in production,
in some cases for nearly a year, I can give you references if you like.  
None have complained of data corruption due to GFS.

> Plus we don't have the hardware for it fiber channel and we have *NO*
> budget.

The next release of GFS will include an improved shared IP
network block driver, called GNBD.  You can run it over Ethernet
or Myrinet, or whatever network you have.

Matt O'Keefe
Sistina Software, Inc.

> 
> If anyone has any suggestions I would glad to hear them.
> 
> Thanks,
> 
> John Strange
> Marconi
> john.ws.strange.at.marconi.com
> 
> -----Original Message-----
> From: Scott Thomason [mailto:SThomaso@phmining.com]
> Sent: Friday, November 02, 2001 2:25 PM
> To: Beowulf@beowulf.org
> Subject: Compile farm?
> 
> 
> Greetings. I'm interested in setting up a shell account/batch
> process/compile farm system for our developers, and I'm wondering if Beowulf
> clusters are well suited to that task. We're not interested in writing
> parallel code using PVM or MPI, we just want to log into what appears to be
> one big server and have it dispatch the workload amongst the slave
> processors. Is Beowulf good at that?
> ---scott
> 
> p.s. Sorry if there are duplicates of this message; I used the wrong email
> address earlier.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

From wsb at paralleldata.com  Tue Nov 20 12:55:07 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO>
Message-ID: <3BFAC32B.4282F6D6@paralleldata.com>

Steve Gaudet wrote:
> 
> Hello,
> 
> > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all...
> > > Here's some of the network GIG E hardware we'd like to recommend:
> > >
> > >   AceNIC/NetGear GA620(T)/3C985B
> > >   SysKonnect
> > >   NS chipset:
> > >      Cameo           SOHO-GA2000T    SOHO-GA2500T
> > >      D-Link          DGE-500T
> > >      PureData        PDP8023Z-TG
> > >      SMC             SMC9462TX
> > >      NetGear         GA622
> >
> >
> > How do you find the performance of these NS82830 cards? Do they do
> > block interupt xfer or whatever it is for more efficient
> > xfer? How much
> > system/interupt time do they chew up?
> 
> The NS based stuff is low-end and cheap - you get what you pay for - but the
> drivers are rock solid and provide a good way to get started with GigE.
> 
> On a high enough powered box, you might even get decent throughput but it's
> at a
> cost of cycles.
> 

If you consider 57MB/sec for a $45 card bad, then the ns83820's
are a bad deal. I consider that a good buy. CPU is around
30% of a P4 1.5ghz system, both sending and receiving sides.

It would be nice if the cpu was lower but it's acceptable for what
I do. (YMMV)

Also would be nice if Gbe switches were cheaper. Latest pricing I
see is $600 for an 8 port switch, or $75/port+$45/card for $120
per connection.

Wes

From math at velocet.ca  Tue Nov 20 13:25:51 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
In-Reply-To: <3BFAC32B.4282F6D6@paralleldata.com>; from wsb@paralleldata.com on Tue, Nov 20, 2001 at 02:55:07PM -0600
References: <3450CC8673CFD411A24700105A618BD6170F7A@911TURBO> <3BFAC32B.4282F6D6@paralleldata.com>
Message-ID: <20011120162551.Q89961@velocet.ca>

On Tue, Nov 20, 2001 at 02:55:07PM -0600, W Bauske's all...
> Steve Gaudet wrote:
> > 
> > Hello,
> > 
> > > On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all...
> > > > Here's some of the network GIG E hardware we'd like to recommend:
> > > >
> > > >   AceNIC/NetGear GA620(T)/3C985B
> > > >   SysKonnect
> > > >   NS chipset:
> > > >      Cameo           SOHO-GA2000T    SOHO-GA2500T
> > > >      D-Link          DGE-500T
> > > >      PureData        PDP8023Z-TG
> > > >      SMC             SMC9462TX
> > > >      NetGear         GA622
> > >
> > >
> > > How do you find the performance of these NS82830 cards? Do they do
> > > block interupt xfer or whatever it is for more efficient
> > > xfer? How much
> > > system/interupt time do they chew up?
> > 
> > The NS based stuff is low-end and cheap - you get what you pay for - but the
> > drivers are rock solid and provide a good way to get started with GigE.
> > 
> > On a high enough powered box, you might even get decent throughput but it's
> > at a
> > cost of cycles.
> > 
> 
> If you consider 57MB/sec for a $45 card bad, then the ns83820's
> are a bad deal. I consider that a good buy. CPU is around
> 30% of a P4 1.5ghz system, both sending and receiving sides.

I dont know what CPU was in use at the time, but I've gotten over 350Mbps
out of them with

one machine: 

dd if=/dev/zero bs=1M count=1000 | nc -w 1 othermachine 33333

othermachine:

nc -w 1 -l -p 33333 | dd of=/dev/null 

(dd on freebsd tells Bps on stderr on termination or interupt) 

> It would be nice if the cpu was lower but it's acceptable for what
> I do. (YMMV) 

With gromacs going between 2 dual-athlon 1.33Ghz CPUs running the d.dppc 
benchmark I only noticed about 2-3% system time and 87% usertime. (Linux 
doesnt seperate interupt and system time like freebsd does, and I
was using linux in this test).

> Also would be nice if Gbe switches were cheaper. Latest pricing I 
> see is $600 for an 8 port switch, or $75/port+$45/card for $120
> per connection. 

Why not just go for SCALI? Since when does COST matter on the Beowulf 
list? </snide> :) 

/kc
--
Ken Chase, math@velocet.ca * Velocet Communications Inc. * Toronto, CANADA 

From rock16905 at yahoo.com  Tue Nov 20 17:25:31 2001
From: rock16905 at yahoo.com (xinhuang zhang)
Date: Wed Nov 25 01:01:53 2009
Subject: (no subject)
Message-ID: <20011121012531.86532.qmail@web20808.mail.yahoo.com>

Hollo;

After installing mpich and doing the test, I got the
following error message. bw-05 is the host and bw-04
is one of nodes. I hope someone can help me sovle this
problem. Thanks a lot!

F. rock

[biocompu@bw-05 examples]$ cd test
[biocompu@bw-05 test]$ make testing
(cd pt2pt ; ./runtests  -check )
Failed to run simple program!
Output from run attempt was
*** Testing Unexpected messages ***
bash:
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third:
No such file or directory
p0_1290:  p4_error: Timeout in making connection to
remote process on bw-04: 0
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
/home/biocompu/mpich-1.2.2.3/bin/mpirun: line 1:  1290
Broken pipe            
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third
-p4pg
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/PI1210
-p4wd /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt
*** Testing Unexpected messages ***
mpirun program was
/home/biocompu/mpich-1.2.2.3/bin/mpirun
mpirun command was 
/home/biocompu/mpich-1.2.2.3/bin/mpirun -mvhome -np 2
./third </dev/null >>third.out 2>&1
make: *** [runtest] Error 1
[biocompu@bw-05 test]$ 


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From rock16905 at yahoo.com  Tue Nov 20 17:29:00 2001
From: rock16905 at yahoo.com (xinhuang zhang)
Date: Wed Nov 25 01:01:53 2009
Subject: need help for installation!
Message-ID: <20011121012900.16531.qmail@web20802.mail.yahoo.com>

Hollo;

After installing mpich and doing the test, I got the
following error message. bw-05 is the host and bw-04
is one of nodes. I hope someone can help me sovle this
problem. Thanks a lot!

F. rock

[biocompu@bw-05 examples]$ cd test
[biocompu@bw-05 test]$ make testing
(cd pt2pt ; ./runtests  -check )
Failed to run simple program!
Output from run attempt was
*** Testing Unexpected messages ***
bash:
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third:
No such file or directory
p0_1290:  p4_error: Timeout in making connection to
remote process on bw-04: 0
Connection failed for reason: : Connection refused
Connection failed for reason: : Connection refused
/home/biocompu/mpich-1.2.2.3/bin/mpirun: line 1:  1290
Broken pipe            
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/./third
-p4pg
/home/biocompu/mpich-1.2.2.3/examples/test/pt2pt/PI1210
-p4wd /home/biocompu/mpich-1.2.2.3/examples/test/pt2pt
*** Testing Unexpected messages ***
mpirun program was
/home/biocompu/mpich-1.2.2.3/bin/mpirun
mpirun command was 
/home/biocompu/mpich-1.2.2.3/bin/mpirun -mvhome -np 2
./third </dev/null >>third.out 2>&1
make: *** [runtest] Error 1
[biocompu@bw-05 test]$ 


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From Eugene.Leitl at lrz.uni-muenchen.de  Wed Nov 21 04:32:49 2001
From: Eugene.Leitl at lrz.uni-muenchen.de (Eugene Leitl)
Date: Wed Nov 25 01:01:53 2009
Subject: LINUX PC CLUSTER AND SEISMIC MIGRATION
Message-ID: <Pine.SOL.4.33.0111211332150.3376-100000@sun4.lrz-muenchen.de>

Sorry for forwarding this late, I'm processing a large backlog.

---------- Forwarded message ----------
Date: Sun, 11 Nov 2001 18:21:41 -0600
From: Roberto Cervantes Muller <robertoc@tesenergy.net>
To: linuxbios@lanl.gov
Subject: info

Dear Sir:

LINUX PC CLUSTER AND SEISMIC MIGRATION

If we run migration software package in a PC Cluster
environment, we distribute all task among the
clusters and also the hard space available on each
cluster. What would happened if one of the clusters
go down while performing a pre stack depth migration?
that might takes months, do we have to re-start
the process , else what would happened with that data?


PROBABLY WILL DEPEND ON COMMUNICATION SOFTWARE:

It depends on the software. The package I am most familiar with, MPICH,
cannot recover from a failed node and the entire parallel process must be
restarted. I don't know if there are any generic software systems that
can handle dynamic changes to the cluster.
Otherwise, it may be possible to adjust your application so it will
checkpoint itself at stages of the computation so it can be
completely restarted at a checkpoint after a node fails.


thanks and regards,


________________________________________
Roberto  Cervantes  Muller
Technical Manager
Tesenergy Services
E-mail: robertoc@tesenergy.net
E-fax: 1 509 696 8501
URL=http://www.tesenergy.net
________________________________________

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
me.

This footnote also confirms that this email message has been swept by
for the presence of computer viruses.
**********************************************************************


From kinghorn at pqs-chem.com  Wed Nov 21 08:59:07 2001
From: kinghorn at pqs-chem.com (Donald B. Kinghorn)
Date: Wed Nov 25 01:01:53 2009
Subject: Tiger MP S2460 was [ot] Re: AMD testing
Message-ID: <0GN5005RCTTOGB@mta4.rcsntx.swbell.net>

 ...

A couple more notes ...

The memory I am using:

 I bought a couple hundred Crucial 256MB reg ecc pc2100 modules so I'm using 
2 of those on each board and I've added 1 Infenion 64x4 512MB reg ecc module 
to each board ( I'm not sure if the Infenion based modules are made by 
Infenion or someone else ... I got them from MicroPro, for $109 but they have 
gone up to ~$135 ) Greg Lindahl mentioned that it's not good to mix modules 
and I agree but what I've got in these boxes seems to working OK.

Kernel:
I'm using a stock Mandrake 2.4.13 kernel build ... the AMD7411 only gets set 
to ata33 (DMA mode 2 I believe). I have a bunch of machines that I've forced 
to ata100 by passing ide0=ata66 on the lilo append line this forces the 
controller into DMA mode 5 (ata100) It seems to be stable but it's too early 
to say for sure. I've seen kernel patches that I think will detect and setup 
the controller correctly but I haven't tried them. The patch (fix) may be in 
the 2.4.14/15 source but I'm not sure. (?)

Also, I just read a review of new motherboards shown at the COMDEX show ... 
there are a bunch of new dual athlon boards in the mix ... so it looks like 
we'll have more choices soon. Hurray!

http://www.anandtech.com/mb/showdoc.html?i=1560&p=1

Best regards
-Don

Dr. Donald B. Kinghorn  Parallel Quantum Solutions LLC
http://www.pqs-chem.com 

From jharrop at shaw.ca  Thu Nov 22 10:01:03 2001
From: jharrop at shaw.ca (4j harrop)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
Message-ID: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus>

Hi, I've been a lurker on this list for some time.  The conversations here 
have been most helpful while I've been working on getting up to speed.  I 
have recently built a small beowulf cluster and am now looking at getting a 
FORTRAN90 compiler.  Can anyone on the list recommend which are better for 
Linux (Redhat 7.2) using mpich (1.2.2.3) ?

If you have negative comments that you would rather not publish to the 
list, please contact me directly at jharrop@shaw.ca

Thanks in advance!

John Harrop


Adapt Systems Corp
Cyberquest Geoscience Ltd


From ron_chen_123 at yahoo.com  Thu Nov 22 17:55:00 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:53 2009
Subject: [PBS-USERS] big cluster
In-Reply-To: <OF4140363F.45544F23-ONC2256B0C.0033B58A@telaviv.ibm.com>
Message-ID: <20011123015500.41676.qmail@web14703.mail.yahoo.com>

"... NCSA runs Maui on 512 node PBS Linux cluster. "
See:
 http://www.supercluster.org/main.html


You may want to apply the scaling patch so that PBS
can scale beyond 500 hosts:
See:
 http://www-unix.mcs.anl.gov/openpbs/


I've heard many rumers about SGE and PBS. Looks like
there is a company spreading the rumers:

http://supportforum.sun.com/cgi-bin/WebX.cgi?13@217.dvcxaQuMfpL^0@.ee8e727

Or if you can't get the page, follow:

http://www.sun.com/software/gridware/support.html

Technical Forums -> Compute Farms -> some comments
overheard by Platform Computing rep.

 -Ron

--- Tamar Domany wrote:
> 
> I heard a rumor that PBS has a scalability problem
> when working with more
> then a 200 compute nodes.
> Is that true ?
> Does any one has a experience ( good or bad ) with
> cluster that size or
> bigger ?
> 
> Thanks
> Tamar
> 
>
__________________________________________________________________________
> To unsubscribe: email majordomo@openpbs.org with
> body "unsubscribe pbs-users"
> For message archives: visit
> http://openpbs.org/UserArea/pbs-users.html
>     -    -    -    -    -    -    -    -    -    -  
>  -    -    -    -
> Academic Site? Use PBS Pro free, see:
> http://www.pbspro.com/academia.html
> OpenPBS and the pbs-users mailing list are sponsored
> by Veridian.
>
__________________________________________________________________________


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From yoon at bh.kyungpook.ac.kr  Thu Nov 22 23:22:35 2001
From: yoon at bh.kyungpook.ac.kr (Yoon Jae Ho)
Date: Wed Nov 25 01:01:53 2009
Subject: 32 bit vs 64 bit computer ?
Message-ID: <001d01c173ef$a16ae600$5f72f2cb@LocalHost>

 
I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit computer.
 
and Why we can't make 128 bit computer for long time ?
 
I don't know how much(the maximum number) the 32 bit computer vs 64 bit makes exact calculation without error.
 
and With different architecture PCs - for example AMD, Intel, MAC cpu , Is it possible to communate the calculatiton results each other ?
 
and With same os - for example LINUX, Is it possble to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ?
 
I mean we can communicate the calculation results with each other( 32 bit vs 64 bit) during caluculation with same O.S ?

Thank you very much

---------------------------------------------------------------------
Yoon Jae Ho
Economist
POSCO Research Institute
 
yoon@bh.kyungpook.ac.kr
jhyoon@mail.posri.re.kr
http://ie.korea.ac.kr/~supercom/  Korea Beowulf Supercomputer
http://members.ud.com/services/teams/team.htm?id=264C68D5-CB71-429F-923D-8614F419065D     Help the people with your PC 
 
Imagination is more important than knowledge.  A. Einstein
http://www.kichun.co.kr   2001.1.6
http://www.c3tv.com    2001.1.10 

------------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/ba5cf820/attachment.html
From jakob at unthought.net  Fri Nov 23 00:57:33 2001
From: jakob at unthought.net (=?iso-8859-1?Q?Jakob_=D8stergaard?=)
Date: Wed Nov 25 01:01:53 2009
Subject: 32 bit vs 64 bit computer ?
In-Reply-To: <001d01c173ef$a16ae600$5f72f2cb@LocalHost>; from yoon@bh.kyungpook.ac.kr on Fri, Nov 23, 2001 at 04:22:35PM +0900
References: <001d01c173ef$a16ae600$5f72f2cb@LocalHost>
Message-ID: <20011123095733.A9896@unthought.net>

On Fri, Nov 23, 2001 at 04:22:35PM +0900, Yoon Jae Ho wrote:
>  
> I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit computer.
>  
> and Why we can't make 128 bit computer for long time ?

*Usually* these bits refer to the addressing capability of the machine.

A 32-bit machine can address a 32-bit memory space, meaning, 2^32 bytes,
or 4 GB.

Now, current 32-bit Intel machines actually contain some hacks so that
the CPU can address more than 32-bits.  One process can still only address
a 32-bit space though (yes, I know you can do windowing/mmap hacks to
sort-of address more, but the process will still live in one 32-bit
address space).

A 64-bit machine can address a 64-bit memory space. I suppose that's
around 16 exabytes or something like that.  It's the rediculous amount
of ~ 10^19 bytes.

Now, a 128 bit machine would address around 10^38 bytes.

There's something like 10^86 elementary particles in the known parts of the
universe - building a machine with an actual 128 bit physical address space is
going to be challenging with today's technology, to say the least  :)

>  
> I don't know how much(the maximum number) the 32 bit computer vs 64 bit makes exact calculation without error.

If you use floating point, you usually use "float" or "double" types.
Those have been 32-bits (float) and 64-bits (double) on all 32-bit
and 64-bit systems regardless, forever.  It's an IEEE standard.

>  
> and With different architecture PCs - for example AMD, Intel, MAC cpu , Is it possible to communate the calculatiton results each other ?

Communication happens with a protocol.

If you protocol is standardised among platforms, you can.  If you didn't
make your protocol to work between different machines, you can't.

>  
> and With same os - for example LINUX, Is it possble to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ?

Sure, it's possible.

Now, many parallel application will either use a protocol that is not
"safe" between different architectures, or the application will depend
on special numerical properties of specific architectures.  Mixing 
architectures can give some headaches there.  But then again, it would
be trivial to make sure that parallel jobs only execute on one particular
architecture.

Whether it's desirable to mix architectures depends entirely on what
kind of applications you are planning to run.   Diversity can be as
useful as it can be painful.    It all depends...

>  
> I mean we can communicate the calculation results with each other( 32 bit vs 64 bit) during caluculation with same O.S ?

Again, communication happens over a protocol.

If your protocol can make it work, it will work.  If your protocol cannot
make it work, it cannot work  -  operating systems do not matter here.

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob ?stergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

From bdorland at kendall.umd.edu  Fri Nov 23 01:16:03 2001
From: bdorland at kendall.umd.edu (Bill Dorland)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
In-Reply-To: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus>
	(message from 4j harrop on Thu, 22 Nov 2001 10:01:03 -0800)
References: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus>
Message-ID: <200111230916.fAN9G3W30351@kendall.umd.edu>

> I have recently built a small beowulf cluster and am now looking at
> getting a FORTRAN90 compiler.  Can anyone on the list recommend
> which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ?

I've tested three Fortran 90 compilers in this basic environment, on a
suite of scientific codes.  They are the Portland Group's f90, NAG
f95, and Lahey/Fujitsu's lf95.  I also tried the Portland Group HPF
compiler.  

I have found the Portland Group products to be heavily bug-ridden, and
essentially unusable by a group of scientists that are actively
developing code that uses Fortran 90 (or HPF) features.  Moreover,
carefully constructed bug reports submitted to the company failed to
stir them.  I strongly advise avoiding this company.  My colleagues at
an American national laboratory independently came to the same
conclusions, based on their problems with the PG products.

The other two compilers, on the other hand, are both very good.  My
colleagues and I are fully satisfied with the performance and
compatibility with the Fortran 90/95 standards of both.  I expect that
either would perform well for you.  

I haven't tried Absoft's f90 compiler, but I will do so next week.
Let me know if you are interested in the results.

                                                         --Bill

From Daniel.Kidger at quadrics.com  Fri Nov 23 01:56:37 2001
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com>


>> I have recently built a small beowulf cluster and am now looking at
>> getting a Fortran90 compiler.  Can anyone on the list recommend
>> which are better for Linux (Redhat 7.2) using mpich (1.2.2.3) ?
>
>>I've tested three Fortran 90 compilers in this basic environment, on a
>>suite of scientific codes.  They are the Portland Group's f90, NAG
>>f95, and Lahey/Fujitsu's lf95.  I also tried the Portland Group HPF
>>compiler.  
>
>>I have found the Portland Group products to be heavily bug-ridden, and
>>essentially unusable by a group of scientists that are actively
>>developing code that uses Fortran 90 (or HPF) features.  Moreover,
>>carefully constructed bug reports submitted to the company failed to
>>stir them.  I strongly advise avoiding this company.  


I would be very careful about your damming of Portland. 
It is widely used and a large base of users and so expect some flames!

However you do not mention the Intel Compiler. In virtually all our tests on
dual Pentium 4s, it 
outperformed the others that we tried. 

Also it works fine with mpich (apart from the fact that you need to build
mpich to expect only a single underscore on subroutines)

Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger@quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


From bdorland at kendall.umd.edu  Fri Nov 23 02:50:09 2001
From: bdorland at kendall.umd.edu (Bill Dorland)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: 
	<010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com>
	(message from Daniel Kidger on Fri, 23 Nov 2001 09:56:37 -0000)
References: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com>
Message-ID: <200111231050.fANAo9c30456@kendall.umd.edu>

> However you do not mention the Intel Compiler. In virtually all our
> tests on dual Pentium 4s, it outperformed the others that we tried.

I have never used the Intel compiler.  Our cluster (Imperial College,
London) is built around AMD Athlons.  Is the Intel compiler compatible
with Athlons?

                                                         --Bill

From Daniel.Kidger at quadrics.com  Fri Nov 23 03:29:10 2001
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed Nov 25 01:01:53 2009
Subject: 32 bit vs 64 bit computer ?
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D33B@stegosaurus.bristol.quadrics.com>

 
>I want to know the exact definition of the 32 bit computer (PC ) vs 64 bit
computer.
>I don't know how much(the maximum number) the 32 bit computer vs 64 bit
makes exact calculation without error.
 
It is a common misunderstanding to equate a 32 bit compiter with 32-bit
numbers in calculations. 
For example my old ZX-Spectrum was an 8-bit computer (and so could only
address 64kB of memory) but stored floating point numbers in 40 bits. 
 
What also can add to the confusion is that Intel Pentiums (which are 32-bit
machines) have always had 64-bit floating point numbers, but internal to
the CPU floating point units they are stored as 80-bits. 
 
 
>and With different architecture PCs - for example AMD, Intel, MAC cpu , Is
it possible to communicate the calculation results each other ?
>and With same os - for example LINUX, Is it possible to make one beowulf
Using Alpha(64 bit) & Intel(32 bit) Computers ?
 
Your other question was about communication between heterogeneous
architectures. This again has always been possible. Before MPI
(unfortunately) came to dominate message-passing, PVM was the standard
library used. PVM is designed for heterogeneous systems. For example I have
a code that uses MPI internally on both a Cray T3E and also a Fujitsu
Vector Processor but which uses PVM to communicate between the two big
machines.
 
 
Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger@quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/93058e45/attachment.html
From jcownie at etnus.com  Fri Nov 23 03:42:10 2001
From: jcownie at etnus.com (James Cownie)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich 
In-Reply-To: Message from Bill Dorland <bdorland@kendall.umd.edu> 
   of "Fri, 23 Nov 2001 05:50:09 EST." <200111231050.fANAo9c30456@kendall.umd.edu> 
Message-ID: <167Eih-0SK-00@etnus.com>

> I have never used the Intel compiler.  Our cluster (Imperial
> College, London) is built around AMD Athlons.  Is the Intel compiler
> compatible with Athlons?

At higher optimisation levels when it is compiling for the pentium
four it will generate SSE-2 instructions which are not implemented on
the Athlons (yet).

I'm not sure whether the license _allows_ you to use it to compile for
Athlons, or whether it checks somewhere in the runtime to ensure that
you don't...

You can download it and try it at only the cost of your time (and it
may remain free to you as an academic even for production use).

It's easy to find on Intel's site if you want to play with it.

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com

From jcownie at etnus.com  Fri Nov 23 04:27:12 2001
From: jcownie at etnus.com (James Cownie)
Date: Wed Nov 25 01:01:53 2009
Subject: 32 bit vs 64 bit computer ? 
In-Reply-To: Your message of "Fri, 23 Nov 2001 11:29:10 GMT."
             <010C86D15E4D1247B9A5DD312B7F5AA739D33B@stegosaurus.bristol.quadrics.com> 
Message-ID: <167FQG-0Ta-00@etnus.com>

> Before MPI (unfortunately) came to dominate message-passing, PVM was
> the standard library used. PVM is designed for heterogeneous
> systems. For example I have a code that uses MPI internally on both a
> Cray T3E and also a Fujitsu Vector Processor but which uses PVM to
> communicate between the two big machines.

Despite the implication above that MPI is inferior to PVM in its
support of heterogeneous systems, the MPI standard _was_ designed
for heterogeneous systems. A conforming MPI program provides enough
information on both send and receive to allow the MPI implementation
to translate data between machine formats (without requiring a
function call per data element to achieve it as PVM used to do!).

The issue which is likely preventing you from exploiting this is that
of _starting_ MPI processes on these two different machines and
exploiting the vendor optimised MPI on both of them. Since CRAY has no
incentive to make their MPI handle a Fujitsu VPP, and Fujitsu has no
incentive to make their MPI handle a Cray T3E interoperability of
_vendor optimised_ MPIs is small. (Though, of course, your Quadrics'
MPI will work in an optimised fashion with the Fujitsu VPP and T3E, I
expect :-)

However, if you're prepared to use a portable MPI such as MPICH, then
you can easily handle heterogeneous machines inside a single
program. (See the Globus/MPI work, for instance). I have also seen
work which used the MPI profiling interface to wrap a vendor MPI so
that it would inter-operate with a portable MPI.

So, in summary 

1) The MPI specification fully supports heterogeneity.
2) There are MPI implementations which support heterogeneity.
3) You're living in another universe if you think that vendors will
   spend any time making their MPI implementations inter-operate
   off-box with their competitors, rather than tweaking their on-box
   performance in the hope of wiping out said competitors !

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com


From Tobias.Peuker at materna.de  Fri Nov 23 05:36:51 2001
From: Tobias.Peuker at materna.de (Tobias.Peuker@materna.de)
Date: Wed Nov 25 01:01:53 2009
Subject: Problem with Sun Grid Engines qrsh
Message-ID: <01A24CDFE59DD411899F00A0C91012A96B4570@chewbacca.materna.de>

Hello,
I have a little problem. I am setting up a parralell compiling farm with SGE
qmake.
But I have a little problem, that I can solve:

When I try to use the qrsh command from SGE the following error message
occures:

bash: ulimit: cannot modify limit: Operation not permitted

Normal RSH and everything else works perfectely.
Does anybody have an idea how to solve this problem?

Regards,
Tobi

From steveb at aei-potsdam.mpg.de  Fri Nov 23 05:46:38 2001
From: steveb at aei-potsdam.mpg.de (Steven Berukoff)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <200111231050.fANAo9c30456@kendall.umd.edu>
Message-ID: <Pine.OSF.4.21.0111231443470.9526-100000@holodec15.aei-potsdam.mpg.de>

Yes, you can use the Intel compilers to compile code for Athlons.  Since
the AMD instruction set supports SSE, you can include Pentium 3
optimizations that improve performance a bit. 

What I'd really like to see, however, are gcc for athlon or, better, a
compiler from AMD!

Cheers
Steve

> 
> > However you do not mention the Intel Compiler. In virtually all our
> > tests on dual Pentium 4s, it outperformed the others that we tried.
> 
> I have never used the Intel compiler.  Our cluster (Imperial College,
> London) is built around AMD Athlons.  Is the Intel compiler compatible
> with Athlons?
> 
>                                                          --Bill
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


=====
Steve Berukoff					tel: 49-331-5677233
Albert-Einstein-Institute			fax: 49-331-5677298
Am Muehlenberg 1, D14477 Golm, Germany		email:steveb@aei.mpg.de


From Florent.Calvayrac at univ-lemans.fr  Fri Nov 23 06:47:13 2001
From: Florent.Calvayrac at univ-lemans.fr (Florent.Calvayrac)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
In-Reply-To: <200111230916.fAN9G3W30351@kendall.umd.edu> from "Bill Dorland" at Nov 23, 2001 04:16:03 AM
Message-ID: <200111231447.PAA13980@pecbip1.univ-lemans.fr>

> 
> 
> I have found the Portland Group products to be heavily bug-ridden, and
> essentially unusable by a group of scientists that are actively
> developing code that uses Fortran 90 (or HPF) features.  Moreover,

Right : compilation and test of LINPACK on our system 
gives at least 50% failure of precision  tests with pgf77, 
versus 0% with g77. However, when it works, the
generated code is at least 20% faster than with 
other compilers.


To reply to another message, Playstations 2 have a
128 bit processor....

-- 
Florent Calvayrac                          | 
Laboratoire de Physique de l'Etat Condense | 
UMR-CNRS 6087         | http://www.univ-lemans.fr/~fcalvay 
Universite du Maine-Faculte des Sciences   |
72085 Le Mans Cedex 9     

From ctierney at hpti.com  Fri Nov 23 07:36:38 2001
From: ctierney at hpti.com (Craig Tierney)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <200111231050.fANAo9c30456@kendall.umd.edu>; from bdorland@kendall.umd.edu on Fri, Nov 23, 2001 at 05:50:09AM -0500
References: <010C86D15E4D1247B9A5DD312B7F5AA739D338@stegosaurus.bristol.quadrics.com> <200111231050.fANAo9c30456@kendall.umd.edu>
Message-ID: <20011123083638.A8562@hpti.com>

On Fri, Nov 23, 2001 at 05:50:09AM -0500, Bill Dorland wrote:
> 
> > However you do not mention the Intel Compiler. In virtually all our
> > tests on dual Pentium 4s, it outperformed the others that we tried.
> 
> I have never used the Intel compiler.  Our cluster (Imperial College,
> London) is built around AMD Athlons.  Is the Intel compiler compatible
> with Athlons?
> 
>                                                          --Bill

I tested out a dual Athlon and a dual P4 system with the Portland
Group and Intel Fortran compilers.  Yes, you can run the Intel
compiler on the AMD.  It works quite well.  The results with my code
showed that the Intel compiler was faster on both platforms.  Your
mileage may vary.  

The only problem with the Intel compiler is that I have had some
problems getting it to take some F77 code that other compilers can
handle.  I usually can work around the internal compiler errors that
the Intel system generates it just takes a little time to find them.

Craig


> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Craig Tierney (ctierney@hpti.com)

From ctierney at hpti.com  Fri Nov 23 08:00:21 2001
From: ctierney at hpti.com (Craig Tierney)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <Pine.OSF.4.21.0111231443470.9526-100000@holodec15.aei-potsdam.mpg.de>; from steveb@aei-potsdam.mpg.de on Fri, Nov 23, 2001 at 02:46:38PM +0100
References: <200111231050.fANAo9c30456@kendall.umd.edu> <Pine.OSF.4.21.0111231443470.9526-100000@holodec15.aei-potsdam.mpg.de>
Message-ID: <20011123090021.A8761@hpti.com>

On Fri, Nov 23, 2001 at 02:46:38PM +0100, Steven Berukoff wrote:
> 
> Yes, you can use the Intel compilers to compile code for Athlons.  Since
> the AMD instruction set supports SSE, you can include Pentium 3
> optimizations that improve performance a bit. 

Does anyone know how similar/different are the SSE instructions
are implemented Athlon vs. P3/P4 chips?  Are the operational counts
the same or is one slower than he other?


> 
> What I'd really like to see, however, are gcc for athlon or, better, a
> compiler from AMD!

An AMD compiler would be nice but it is not going to happen (opinion not
fact).  However an easy way for them to achieve this is to offer $$$$
to any compiler vendor to implement the 3Dnow instructions natively.

Craig

> 
> Cheers
> Steve
> 
> > 
> > > However you do not mention the Intel Compiler. In virtually all our
> > > tests on dual Pentium 4s, it outperformed the others that we tried.
> > 
> > I have never used the Intel compiler.  Our cluster (Imperial College,
> > London) is built around AMD Athlons.  Is the Intel compiler compatible
> > with Athlons?
> > 
> >                                                          --Bill
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> 
> 
> =====
> Steve Berukoff					tel: 49-331-5677233
> Albert-Einstein-Institute			fax: 49-331-5677298
> Am Muehlenberg 1, D14477 Golm, Germany		email:steveb@aei.mpg.de
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Craig Tierney (ctierney@hpti.com)

From rauch at inf.ethz.ch  Fri Nov 23 08:51:33 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed Nov 25 01:01:53 2009
Subject: Gigabit Ethernet switches and network adaptors.
In-Reply-To: <20011119201418.L66460@velocet.ca>
Message-ID: <Pine.LNX.4.21.0111231734020.26631-100000@maloney.inf.ethz.ch>

On Mon, 19 Nov 2001, Velocet wrote:
> On Mon, Nov 19, 2001 at 10:00:46AM -0500, Steve Gaudet's all...
> > Here's some of the network GIG E hardware we'd like to recommend:
> > 
> >   AceNIC/NetGear GA620(T)/3C985B
> >   SysKonnect
> >   NS chipset:
> >      Cameo           SOHO-GA2000T    SOHO-GA2500T
> >      D-Link          DGE-500T
> >      PureData        PDP8023Z-TG
> >      SMC             SMC9462TX
> >      NetGear         GA622
> 
> 
> How do you find the performance of these NS82830 cards? Do they do
> block interupt xfer or whatever it is for more efficient xfer? How much
> system/interupt time do they chew up?

I don't know about the 82830, but a student in our group is working on
a (sepcial) driver for the DP83820 chip on an ASANTE GigaNIX
card. While the cards where cheap and have a rich feature set, there
are mainly two problems as far as we can see:

- The card has hardware bugs. The student discovered 3 bugs, but could
  fortunately work around them.

- The FIFOs on the card are very small (8 KB TX and 32 KB RX if I
  remember correctly). The student had to fiddle quite a bit with some
  of the parameters of the card to get acceptable performance. This
  might also be responsible for the relatively low throughput (it
  could also be the implementation of the DMA engine). The card seems
  unable to transfer more than about 70-80 MB/s _without_ any protocol
  stack (the senders driver just transmits the same data over and
  over, while the receiver simply marks received packets as
  `handled'). As a comparison: Our hamachi cards transfer more than
  100 MB/s _with_ TCP/IP on the same machines!
  
So, our experiences the DP83820 based cards are not the best, but they
work.

- Felix
-- 
Felix Rauch                      | Email: rauch@inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From Daniel.Kidger at quadrics.com  Fri Nov 23 10:15:30 2001
From: Daniel.Kidger at quadrics.com (Daniel Kidger)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
Message-ID: <010C86D15E4D1247B9A5DD312B7F5AA739D33C@stegosaurus.bristol.quadrics.com>

>Right : compilation and test of LINPACK on our system 
>gives at least 50% failure of precision  tests with pgf77, 
>versus 0% with g77. However, when it works, the
>generated code is at least 20% faster than with 
>other compilers.

That does not prove that pgf77 is broken!

What if linpack (./xhpl) has a bug whereby a variable is not initialised to
zero?

pgf77 may be actingly correctly by not having to initialise it and g77 may
be over-keen in setting all undeclared values to be zero.


Yours,
Daniel.

(ps. yes I _do_ suspect this is actually true - I have spurious problems
with the Intel complier on xhpl too)

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger@quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------


From djholm at fnal.gov  Fri Nov 23 11:10:59 2001
From: djholm at fnal.gov (Don Holmgren)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <20011123090021.A8761@hpti.com>
Message-ID: <Pine.SGI.4.21.0111231302530.31507817-100000@hppc.fnal.gov>


On Fri, 23 Nov 2001, Craig Tierney wrote:

> On Fri, Nov 23, 2001 at 02:46:38PM +0100, Steven Berukoff wrote:
> > 
> > Yes, you can use the Intel compilers to compile code for Athlons.  Since
> > the AMD instruction set supports SSE, you can include Pentium 3
> > optimizations that improve performance a bit. 
> 
> Does anyone know how similar/different are the SSE instructions
> are implemented Athlon vs. P3/P4 chips?  Are the operational counts
> the same or is one slower than he other?
> 


At the very bottom of the page,
   http://qcdhome.fnal.gov/sse/
I have a table with cycle counts posted for a number of matrix-matrix
and matrix-vector routines as measured on a P-III (Coppermine), P4, and
an Athlon MP.  Times are posted for both a pure-C version of each
routine, built with gcc, as well as for an SSE version.  The sources
for each are available at
   http://qcdhome.fnal.gov/sse/catalog.html

The results are a mixed bag, with each flavor processor sometimes first,
second, or third.  I'm using only a small subset of SSE - mostly shufps,
addps, mulps, with a few xops, movaps, and movups thrown in.  I haven't
timed individual instructions on all three processors.

Don Holmgren
Fermilab


From jcandy1 at san.rr.com  Fri Nov 23 13:13:19 2001
From: jcandy1 at san.rr.com (Jeff Candy)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
References: <5.0.2.1.0.20011122100055.00aa0520@pop3.norton.antivirus> <200111230916.fAN9G3W30351@kendall.umd.edu>
Message-ID: <3BFEBBEF.86CA2754@san.rr.com>

Bill Dorland wrote:

> I've tested three Fortran 90 compilers in this basic environment, on a
> suite of scientific codes.  They are the Portland Group's f90, NAG
> f95, and Lahey/Fujitsu's lf95.  I also tried the Portland Group HPF
> compiler.
> 
> I have found the Portland Group products to be heavily bug-ridden, and
> essentially unusable by a group of scientists that are actively
> developing code that uses Fortran 90 (or HPF) features.  Moreover,
> carefully constructed bug reports submitted to the company failed to
> stir them.  I strongly advise avoiding this company.  My colleagues at
> an American national laboratory independently came to the same
> conclusions, based on their problems with the PG products.
> 
> The other two compilers, on the other hand, are both very good.  My
> colleagues and I are fully satisfied with the performance and
> compatibility with the Fortran 90/95 standards of both.  I expect that
> either would perform well for you.

I have grown increasingly more unhappy with The Portland Group 
and its compilers over the last year.  In comparison with the 
Lahey/Fujitsu product (lf95), for example, quality of syntax and 
run-time error-checking is worse.  License management is more tedious.  
Code generated with pgf90 tends to be slightly faster, but not by 
any amount that would recommend its use.  I believe an average user 
will produce bug-free code faster with lf95 than pgf90.  

Jeff

From serguei.patchkovskii at sympatico.ca  Fri Nov 23 14:31:28 2001
From: serguei.patchkovskii at sympatico.ca (serguei.patchkovskii@sympatico.ca)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
Message-ID: <20011123223128.EJXU24249.tomts11-srv.bellnexxia.net@[209.226.175.18]>

> What if linpack (./xhpl) has a bug whereby a variable 
> is not initialised to zero?
> 
> pgf77 may be actingly correctly by not having 
> to initialise it and g77 may be over-keen in 
> setting all undeclared values to be zero.

I strongly suspect that adding "-pc64 -Kieee" to your
compilation options will allow the tests to complete.

Serguei


From bjornfot at erix.ericsson.se  Mon Nov 19 00:49:18 2001
From: bjornfot at erix.ericsson.se (Lars Bj|rnfot)
Date: Wed Nov 25 01:01:53 2009
Subject: Compilation problem.
References: <01b501c16bc3$02c6c3e0$906a7080@divine>
Message-ID: <3BF8C78E.901668D0@erix.ericsson.se>

Hi,

I posted an answer for this some months ago, patch added below. 
Hope it works though the versions is slightly newer. 

Regards,
Lars

> "Zhifeng F. Chen" wrote:
> 
> Hi,
> 
>     When compiling mvich-1.0a6.1 under mpich-1.2.2.3,
> 
>   ./configure --with-device=via --with-arch=LINUX --without-romio  -cflags="-DUSE_STDARG -O2 -DCPU_X86 -DNIC_GIGANET -DVIPL095" -lib="-lgnivipl -lpthread"
>   is fine.
> 
>    When I came to make, it reports:
> cc1: warnings being treated as errors
> queue.c: In function `MPID_Search_unexpected_for_request':
> queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP'
> make[3]: *** [queue.o] Error 1
> Exit status from make was 2
> make[2]: *** [mpilib] Error 1
> make[1]: *** [mpi-modules] Error 2
> make: *** [mpi] Error 2
>    Can anyone help me out?
> 
> ZF


The reason seems to be mpid.h that exists in two versions, 
and the mpid/via/mpid.h  seems outdated. I send a patch that 
works for me (mpich-1.2.1 and mvich-1.0a6.1). It's rough, just 
to get it to compile.

make mpilib
# fails 
# queue.c:296: warning: implicit declaration of function `MPID_AINT_CMP'
# see  diff ./mpid/ch2/mpid.h ./mpid/via/mpid.h

patch -p1 < patch-mpid.h
# make mpilib succeeds w/o errors. 

Regards,
Lars


> Jeffrey Tilson wrote:
> 
> Hi,
> This is my first attempt with mvich (1.0a6.1). I'm using mpich 1.2.2. I have a small
Emulex (cLAN 1000) connected cluster running RH 6.2/2.2.19.  I've pretty much followed
the mvich installation instructions. The problem is the function MPID_AINT_CMP. It
doesn't appear to be defined anywhere not used by any code other than queue.c. Can
someone suggest a solution to this?
> Thanks,
> --jeff
>


*** mpich-1.2.1/mpid/via/mpid.h.orig    Tue Jul  4 01:58:12 2000
--- mpich-1.2.1/mpid/via/mpid.h Wed Jun 20 23:57:51 2001
***************
*** 99,108 ****
--- 99,110 ----
  typedef int MPID_Aint;
  #define MPID_AINT_SET(a,b) a = b
  #define MPID_AINT_GET(a,b) a = b
+ #define MPID_AINT_CMP(a,b) (a) == (b)
  #elif  defined(MPID_LONG8)
  typedef long MPID_Aint;
  #define MPID_AINT_SET(a,b) a = b
  #define MPID_AINT_GET(a,b) a = b
+ #define MPID_AINT_CMP(a,b) (a) == (b)
  #else
  #define MPID_AINT_IS_STRUCT
  /* This is complicated by the need to set only the significant bits when
***************
*** 115,123 ****
--- 117,127 ----
  #ifndef POINTER_64_BITS
  #define MPID_AINT_SET(a,b) (a).low = (unsigned)(b)
  #define MPID_AINT_GET(a,b) (a) = (void *)(b).low
+ #define MPID_AINT_CMP(a,b) ((a).low == (b).low)
  #else
  #define MPID_AINT_SET(a,b) (a) = *(MPID_Aint *)&(b)
  #define MPID_AINT_GET(a,b) *(MPID_Aint *)&(a) = *&(b)
+ #define MPID_AINT_CMP(a,b) ((a).low == (b).low) && ((a).high == (b).high)
  #endif
  #endif
  #else /* Not MPID_HAS_HETERO */
***************
*** 131,136 ****
--- 135,141 ----
  a = b;\
  DEBUG_H_INT(fprintf( stderr, "[%d] Aint get %x <- %x\n", MPID_MyWorldRank, a, b ));\
          }
+ #define MPID_AINT_CMP(a,b) (a) == (b)
  #endif
  
  typedef int MPID_RNDV_T;

From jyrki.huusko at vtt.fi  Wed Nov 21 06:06:49 2001
From: jyrki.huusko at vtt.fi (Jyrki Huusko)
Date: Wed Nov 25 01:01:53 2009
Subject: Network simulator2 + Beowulf
Message-ID: <4.3.2.7.2.20011121155636.00f33e80@elemail.ele.vtt.fi>

Good day,

Has anyone used NS2 - Network simulator on Beowulf? In other words has 
anyone tried to parallelise the NS2-simulator using MPI and run it on 
distributed environment? We are currently planning to develop a network 
simulator (like Opnet, GlomoSIM and NS2) for distributed computer systems 
(mainly Beowulf type clusters) and thus we are quite interested in work 
already done in this field of study...if there is any information freely 
available....

Sincerely Yours,

Jyrki Huusko

  "I think there's a world market for about five computers."
				-Thomas Watson (IBM)-
--
Jyrki Huusko, jyrki.huusko@vtt.fi
Kaitov?yl? 1 P.O.BOX 1100, FIN-90571 OULU, FINLAND
Tel. +358 8 551 2111, Fax +358 8 551 2320
http://www.vtt.fi http://www.willab.fi/telaketju


From jharrop at shaw.ca  Wed Nov 21 11:42:26 2001
From: jharrop at shaw.ca (4j harrop)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
Message-ID: <5.0.2.1.0.20011121113440.00a208b0@pop3.norton.antivirus>

Hi, I've been a lurker on this list for some time.  The conversations here 
have been most helpful while I've been working on getting up to speed.  I 
have recently built a small beowulf cluster and am now looking at getting a 
FORTRAN90 compiler.  Can anyone on the list recommend which are better for 
Linux (Redhat 7.2) using mpich (1.2.2.3) ?

If you have negative comments that you would rather not publish to the 
list, please contact me directly at jharrop@shaw.ca

Thanks in advance!

John Harrop


Adapt Systems Corp
Cyberquest Geoscience Ltd


From thanhaic at yahoo.com  Fri Nov 23 00:49:12 2001
From: thanhaic at yahoo.com (thanh)
Date: Wed Nov 25 01:01:53 2009
Subject: ask
Message-ID: <001f01c173fb$bbc63840$2a016481@100.1.199.aic.com.vn>

Dear,
 When programming MPICH, I wanted include some class of Qt lib, could you show me the way do it.
 Help me !
Thank

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011123/ce4897fa/attachment.html
From ron_chen_123 at yahoo.com  Sat Nov 24 08:28:36 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:53 2009
Subject: Problem with Sun Grid Engines qrsh
In-Reply-To: <01A24CDFE59DD411899F00A0C91012A96B4570@chewbacca.materna.de>
Message-ID: <20011124162836.12011.qmail@web14703.mail.yahoo.com>

Please send problems with SGE to the opensource
mailing-list. (you'll need to subscribe first)

If you need commerical support for SGE, please note
that there are 3 3rd-party companies providing support
for non-Solaris platforms.

Back to your question, is your .profile calling the
limit? Also, which Linux kernel are you using?

 -Ron

--- Tobias.Peuker@materna.de wrote:
> Hello,
> I have a little problem. I am setting up a parralell
> compiling farm with SGE
> qmake.
> But I have a little problem, that I can solve:
> 
> When I try to use the qrsh command from SGE the
> following error message
> occures:
> 
> bash: ulimit: cannot modify limit: Operation not
> permitted
> 
> Normal RSH and everything else works perfectely.
> Does anybody have an idea how to solve this problem?
> 
> Regards,
> Tobi
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From ron_chen_123 at yahoo.com  Sat Nov 24 08:50:04 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:53 2009
Subject: ask
In-Reply-To: <001f01c173fb$bbc63840$2a016481@100.1.199.aic.com.vn>
Message-ID: <20011124165004.68297.qmail@web14706.mail.yahoo.com>

What kind of problem did you encounter?

Does "mpiCC <file.C> -L<path of Qt lib> -l<name of the
Qt lib>" work.

 -Ron

--- thanh <thanhaic@yahoo.com> wrote:
> Dear,
>  When programming MPICH, I wanted include some class
> of Qt lib, could you show me the way do it.
>  Help me !
> Thank
> 
> 


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From atctam at csis.hku.hk  Sat Nov 24 23:25:08 2001
From: atctam at csis.hku.hk (Anthony Tam)
Date: Wed Nov 25 01:01:53 2009
Subject: NFS service
Message-ID: <1006673108.3c009cd4126d2@intranet.csis.hku.hk>

Hi all,

I am looking for information regarding to the support of
high-available or fault-tolerant NFS service on a medium-
size cluster (> 32 nodes). Any idea on where can I find
these information?
Thanks.


Cheers

Anthony         


      e Y8               d8    88                                    
     d8b Y8     88*8e   d8888  88*e    88 88   88*8e  Y8b Y888 
    d888b Y8    88 88b   88    88 88  88   88  88 88b  Y8b Y8  
   d888888888   88 888   88    88 88  88   88  88 888   Y8b    
  d888    b Y8  88 888   888   88 88   88 88   88 888    88    
                                                        88    
                                                       88   

From per at computer.org  Sun Nov 25 07:48:27 2001
From: per at computer.org (Per Jessen)
Date: Wed Nov 25 01:01:53 2009
Subject: network drivers - using 3c509 and 3c515 in the same system ?
Message-ID: <3C00D8A60000B61A@mta2n.bluewin.ch> (added by postmaster@bluewin.ch)

All,

I've been working on upgrading the master node in our cluster this weekend,
and hit an issue with using a 3C509 and a 3C515 card in the same system.

When the 3C515 module is loaded first, loading the 3C509 module will
lock the system hard. Same goes if the card is a 3C509B (PnP-capable).
If instead the 3C509 module is loaded first, the 3C515 driver cannot find 
the 3C515 card, and refuses to load.

I looked at using the newer 3C515.c from the Scyld page, but realised
that it only works with 2.2, not 2.4 - and the masternode is 2.4.14.

I ended up using 2 x 3C515, but would like to know if anyone else has
noticed this behaviour with a combination of 3C509 and 3C515 ?


tnx,
Per Jessen

regards,
Per Jessen, Zurich
http://www.enidan.com - home of the J1 serial console.

Windows 2001: "I'm sorry Dave ...  I'm afraid I can't do that."


From rgb at phy.duke.edu  Sun Nov 25 08:32:25 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <Pine.SGI.4.21.0111231302530.31507817-100000@hppc.fnal.gov>
Message-ID: <Pine.LNX.4.33.0111251128330.1423-100000@lilith2.rgb.private.net>

On Fri, 23 Nov 2001, Don Holmgren wrote:

> At the very bottom of the page,
>    http://qcdhome.fnal.gov/sse/
> I have a table with cycle counts posted for a number of matrix-matrix
> and matrix-vector routines as measured on a P-III (Coppermine), P4, and
> an Athlon MP.  Times are posted for both a pure-C version of each
> routine, built with gcc, as well as for an SSE version.  The sources
> for each are available at
>    http://qcdhome.fnal.gov/sse/catalog.html
> 
> The results are a mixed bag, with each flavor processor sometimes first,
> second, or third.  I'm using only a small subset of SSE - mostly shufps,
> addps, mulps, with a few xops, movaps, and movups thrown in.  I haven't
> timed individual instructions on all three processors.
> 
> Don Holmgren
> Fermilab

Awesomely useful, Don, thanks.

Do you have any idea what the overall marginal benefit is of using your
hand-optimized routines when working on large datasets (too big to fit
into cache)?  In particular, does performance devolve to
memory-bandwidth-bound behavior (and hence end up being the same for
MILC and SSE and dominated by the memory bus speed)?

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From spoel at xray.bmc.uu.se  Sun Nov 25 09:53:58 2001
From: spoel at xray.bmc.uu.se (David van der Spoel)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <Pine.LNX.4.33.0111251128330.1423-100000@lilith2.rgb.private.net>
Message-ID: <Pine.LNX.4.10.10111251850330.26418-100000@zorn.bmc.uu.se>

On Sun, 25 Nov 2001, Robert G. Brown wrote:

>Do you have any idea what the overall marginal benefit is of using your
>hand-optimized routines when working on large datasets (too big to fit
>into cache)?  In particular, does performance devolve to
>memory-bandwidth-bound behavior (and hence end up being the same for
>MILC and SSE and dominated by the memory bus speed)?
>
>    rgb
Of course YMMV, but for our application (molecular dynamics) the impact of
SSE is high: a factor of 1.5 for large applications, more than so for
smaller applications (see http://www.gromacs.org/benchmarks/scaling.php
for comparisons). I should admit that it was very time consuming to write
all that much assembly code (but the guy did it out of his own free will)


Groeten, David.
________________________________________________________________________
Dr. David van der Spoel, 	Biomedical center, Dept. of Biochemistry
Husargatan 3, Box 576,  	75123 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel@xray.bmc.uu.se	spoel@gromacs.org   http://zorn.bmc.uu.se/~spoel
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From rgb at phy.duke.edu  Sun Nov 25 10:22:16 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:53 2009
Subject: Fortran compilers for Linux/mpich
In-Reply-To: <Pine.LNX.4.10.10111251850330.26418-100000@zorn.bmc.uu.se>
Message-ID: <Pine.LNX.4.33.0111251318540.1423-100000@lilith2.rgb.private.net>

On Sun, 25 Nov 2001, David van der Spoel wrote:

> On Sun, 25 Nov 2001, Robert G. Brown wrote:
> 
> >Do you have any idea what the overall marginal benefit is of using your
> >hand-optimized routines when working on large datasets (too big to fit
> >into cache)?  In particular, does performance devolve to
> >memory-bandwidth-bound behavior (and hence end up being the same for
> >MILC and SSE and dominated by the memory bus speed)?
> >
> >    rgb
> Of course YMMV, but for our application (molecular dynamics) the impact of
> SSE is high: a factor of 1.5 for large applications, more than so for
> smaller applications (see http://www.gromacs.org/benchmarks/scaling.php
> for comparisons). I should admit that it was very time consuming to write
> all that much assembly code (but the guy did it out of his own free will)

I've been meaning to go back and play with this -- there must be some
way of quantifying the crossover point between CPU bound and memory I/O
bound code, and I've got a decent benchmark timing harness at this point
that I can use to explore it.

It's good to hear that it can yield a real benefit for large data codes
though.

    rgb

> 
> 
> Groeten, David.
> ________________________________________________________________________
> Dr. David van der Spoel, 	Biomedical center, Dept. of Biochemistry
> Husargatan 3, Box 576,  	75123 Uppsala, Sweden
> phone:	46 18 471 4205		fax: 46 18 511 755
> spoel@xray.bmc.uu.se	spoel@gromacs.org   http://zorn.bmc.uu.se/~spoel
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From okeefe at sistina.com  Sun Nov 25 17:04:25 2001
From: okeefe at sistina.com (Matt Okeefe)
Date: Wed Nov 25 01:01:53 2009
Subject: NFS service
In-Reply-To: <1006673108.3c009cd4126d2@intranet.csis.hku.hk>
References: <1006673108.3c009cd4126d2@intranet.csis.hku.hk>
Message-ID: <20011125190425.A16997@sistina.com>

On Sun, Nov 25, 2001 at 03:25:08PM +0800, Anthony Tam wrote:
> 
> Hi all,
> 
> I am looking for information regarding to the support of
> high-available or fault-tolerant NFS service on a medium-
> size cluster (> 32 nodes). Any idea on where can I find
> these information?

Anthony,

Mission Critical Linux, among others, sells NFS fail-over
software for two servers.  Sistina's GFS is a Linux cluster
file system that can allow multiple NFS servers to export
the same shared file system to a large number of Beowulf clients
(this approach allows much more scalability than just a single
NFS server:  you can read about it in the paper "Accelerating
Technical Computing with Sistina's GFS" at www.sistina.com).
If you are interested in using NFS to create a shared root partition
for diskless workstations check out the NFS cluster project
at Sourceforge:

http://clusternfs.sourceforge.net/

I hope this helps.
Matt O'Keefe
Sistina Software, Inc. 
> Thanks.
> 
> 
> Cheers
> 
> Anthony         
> 
> 
>                  
>       e Y8               d8    88                                    
>      d8b Y8     88*8e   d8888  88*e    88 88   88*8e  Y8b Y888 
>     d888b Y8    88 88b   88    88 88  88   88  88 88b  Y8b Y8  
>    d888888888   88 888   88    88 88  88   88  88 888   Y8b    
>   d888    b Y8  88 888   888   88 88   88 88   88 888    88    
>                                                         88    
>                                                        88   
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

From ron_chen_123 at yahoo.com  Sun Nov 25 20:03:35 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:53 2009
Subject: NFS service
In-Reply-To: <1006673108.3c009cd4126d2@intranet.csis.hku.hk>
Message-ID: <20011126040335.8681.qmail@web14702.mail.yahoo.com>

If you need a really reliable solution, you should use
Sun Cluster:

http://www.sun.com/clusters/index.jhtml

Otherwise, if you need something cheap, may be you can
hack around with Linux-HA, with NFS over CFS.

http://www.linux-ha.org/

 -Ron


--- Anthony Tam <atctam@csis.hku.hk> wrote:
> 
> Hi all,
> 
> I am looking for information regarding to the
> support of
> high-available or fault-tolerant NFS service on a
> medium-
> size cluster (> 32 nodes). Any idea on where can I
> find
> these information?
> Thanks.
> 
> 
> Cheers
> 
> Anthony         
> 
> 
>                  
>       e Y8               d8    88                   
>                 
>      d8b Y8     88*8e   d8888  88*e    88 88   88*8e
>  Y8b Y888 
>     d888b Y8    88 88b   88    88 88  88   88  88
> 88b  Y8b Y8  
>    d888888888   88 888   88    88 88  88   88  88
> 888   Y8b    
>   d888    b Y8  88 888   888   88 88   88 88   88
> 888    88    
>                                                     
>    88    
>                                                     
>   88   
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or
> unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From gabriel.weinstock at dnamerican.com  Mon Nov 26 06:11:07 2001
From: gabriel.weinstock at dnamerican.com (Gabriel J. Weinstock)
Date: Wed Nov 25 01:01:53 2009
Subject: ask
In-Reply-To: <20011124165004.68297.qmail@web14706.mail.yahoo.com>
References: <20011124165004.68297.qmail@web14706.mail.yahoo.com>
Message-ID: <14250523411684@DNAMERICAN.COM>

I would suspect it would not work, although I don't have a well thought out 
reason why. I know that you can't for instance, run svgalib programs with MPI 
(at least LAM-MPI.) wouldn't you need something like XMPI (LAM) or MPE 
(MPICH) to do graphical output?
Gabe

On Saturday 24 November 2001 11:50 am, Ron Chen wrote:
> What kind of problem did you encounter?
>
> Does "mpiCC <file.C> -L<path of Qt lib> -l<name of the
> Qt lib>" work.
>
>  -Ron
>
> --- thanh <thanhaic@yahoo.com> wrote:
> > Dear,
> >  When programming MPICH, I wanted include some class
> > of Qt lib, could you show me the way do it.
> >  Help me !
> > Thank
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
> http://geocities.yahoo.com/ps/info1
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

From joe.griffin at mscsoftware.com  Mon Nov 26 06:43:42 2001
From: joe.griffin at mscsoftware.com (Joe Griffin)
Date: Wed Nov 25 01:01:53 2009
Subject: 32 bit vs 64 bit computer ?
References: <001d01c173ef$a16ae600$5f72f2cb@LocalHost>
Message-ID: <3C02551E.8FA0D582@mscsoftware.com>

Yoon Jae Ho,

>I want to know the exact definition of 
>the 32 bit computer (PC ) vs 64 bit computer.
>and Why we can't make 128 bit computer for
>long time ?

The term "64 bit computer" is usually
used for one of two types:
LP64 ..... longs and pointers are 64 bits 
           (example is an Intel Itanium).
ILP64 .... Integer, Reals, longs and pointers are
           64 bits (example is a CRAY)

LP64 systems allow for high address ranges.
IPL64 allows for a high address range, and
greater accuracy of calculations.

To answer why can't we make 128 bit computer,
I must ask, why would you want to?

2^128 is a very big number.  I could not see
the need for either that much address space 
or that much precision.

>I don't know how much(the maximum number) 
>the 32 bit computer vs 64 bit makes exact
> calculation without error.
> and With different architecture PCs - for
> example AMD, Intel, MAC cpu , Is it possible
> to communate the calculatiton results each other ?

On a 32 bit system like Intel/AMD chips, Real data
uses the following:
1 bit ..... sign
8 bits .... Exponent (magnitude of number)
23 bits ... Mantissa (accuracy of number)

On a 32 bit system you may have 64 bit reals.  If
so:

1 bit ....  sign
11 bits ... Exponent
52 bits ... Mantissa

>and With same os - for example LINUX, Is it possble
> to make one beowulf Using Alpha(64 bit) & Intel(32 bit) Computers ?

I believe a strict definition of beowulf is commodity 
of the shelf systems.  I don't think Alpha is included there.
But lots of people mean "cluster" when they say beowulf.
You can cluster Alpha and Intel systems, but using 
them together is dependent of the software.

>I mean we can communicate the calculation results with
>each other( 32 bit vs 64 bit) during caluculation with same O.S ?

During the calculations???  I think not.

Regards,
Joe

From lmeerkat at yahoo.com  Mon Nov 26 14:51:43 2001
From: lmeerkat at yahoo.com (L. Gritsenko)
Date: Wed Nov 25 01:01:53 2009
Subject: How to add Beowulf node with SCSI HD?
Message-ID: <20011126225143.27491.qmail@web20604.mail.yahoo.com>

Hi,

I  am using Scyld Beowulf 27bz-8. I boot  a node that
has SCSI hard driver to a master that has IDE hard
driver. After the node  was  set  up to "boot"
state I received  the  following message in the log
file :  "/dev/hda: No such device".   Yes, it is
correct  that I  do not have  any "hda" on this
node but I do still have "/dev/sda" there! What  do I
need to change in the boot procedure  in order to
solve this problem?  I beleive I can add a node
which has SCSI hard drive.

Thanks,
Lyudmila Gritsenko


=====


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From vanw at tticluster.com  Tue Nov 27 08:41:37 2001
From: vanw at tticluster.com (Kevin Van Workum)
Date: Wed Nov 25 01:01:53 2009
Subject: FORTRAN compilers
In-Reply-To: <5.0.2.1.0.20011121113440.00a208b0@pop3.norton.antivirus>
Message-ID: <001c01c17762$64ce8140$63b36880@aframe>

If you'd like to benchmark Lahey Fortran with MPICH on a 1.3 GHz AMD
cluster with DDR RAM, checkout these sites:

www.tsunamictechnologies.com
www.lahey.com


Kevin Van Workum
University of Wisconsin


> -----Original Message-----
> From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org] On
> Behalf Of 4j harrop
> Sent: Wednesday, November 21, 2001 1:42 PM
> To: Beowulf mailing list
> Subject: FORTRAN compilers
> 
> Hi, I've been a lurker on this list for some time.  The conversations
here
> have been most helpful while I've been working on getting up to speed.
I
> have recently built a small beowulf cluster and am now looking at
getting
> a
> FORTRAN90 compiler.  Can anyone on the list recommend which are better
for
> Linux (Redhat 7.2) using mpich (1.2.2.3) ?
> 
> If you have negative comments that you would rather not publish to the
> list, please contact me directly at jharrop@shaw.ca
> 
> Thanks in advance!
> 
> John Harrop
> 
> 
> Adapt Systems Corp
> Cyberquest Geoscience Ltd
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


From becker at scyld.com  Tue Nov 27 10:11:37 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:53 2009
Subject: How to add Beowulf node with SCSI HD?
In-Reply-To: <20011126225143.27491.qmail@web20604.mail.yahoo.com>
Message-ID: <Pine.LNX.4.10.10111271307010.978-100000@vaio.greennet>

On Mon, 26 Nov 2001, L. Gritsenko wrote:

> I  am using Scyld Beowulf 27bz-8. I boot  a node that
> has SCSI hard driver to a master that has IDE hard
> driver. After the node  was  set  up to "boot"
> state I received  the  following message in the log
> file :  "/dev/hda: No such device".

This is a harmless message.

To avoid seeing it, comment out the 'hdparm' call in the node_up script.
(The base script is in /etc/beowult/node_up, but that script just calls
the /usr/lib/beoboot/bin/node_up script.)

Newer releases take care to not emit this confusing message.


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From mlrecv at yahoo.com  Tue Nov 27 11:27:52 2001
From: mlrecv at yahoo.com (Zhifeng Chen)
Date: Wed Nov 25 01:01:53 2009
Subject: SMP support comparison between NT and Linux
Message-ID: <20011127192752.1050.qmail@web14810.mail.yahoo.com>

Hi,
   Any review article or comments on SMP support
comparison between NT and Linux? Which is better?

ZF


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From xfye at mail.ustc.edu.cn  Tue Nov 27 18:15:47 2001
From: xfye at mail.ustc.edu.cn (XianFeng Ye)
Date: Wed Nov 25 01:01:53 2009
Subject: about SCSI HD & F90
In-Reply-To: <200111271703.fARH3W025958@blueraja.scyld.com>
Message-ID: <Pine.GSO.4.31L2A.0111281005560.3833-100000@mail>

> I  am using Scyld Beowulf 27bz-8. I boot  a node that
> has SCSI hard driver to a master that has IDE hard
> driver. After the node  was  set  up to "boot"
> state I received  the  following message in the log
> file :  "/dev/hda: No such device".   Yes, it is

Maybe you can do as this:
	ln -sf /hda/sda /dev/hda


> > FORTRAN90 compiler.  Can anyone on the list recommend which are better
> for
> > Linux (Redhat 7.2) using mpich (1.2.2.3) ?
Maybe pgfortran can do this.
I am puzzled a lot when I compile a f77 program with pgf77(3.2) and
g77(2.95) that pgf77's ability can't surpass g77? Can someone comment
something on this?


From scheinin at crs4.it  Wed Nov 28 02:48:49 2001
From: scheinin at crs4.it (Alan Scheinine)
Date: Wed Nov 25 01:01:53 2009
Subject: AMD 760 MPX ?
Message-ID: <200111281048.LAA10712@dylandog.crs4.it>

   I have been trying to avoid polluting this newsgroup with
a useless question but I cannot contain myself any longer.
In a very nice article by Anand Lal Shimpi written on 5 June 2001,
we can read "Don't expect too many manufacturers other than Tyan
to have a board [with the 760MPX] until mid-late Q3 2001."  (On
the copy I printed for myself I do not see the URL.)  Someone
else wrote around the start of November that the 760MPX will be
announced in mid-November.  Any news?
best regards,
Alan Scheinine  Email: scheinin@crs4.it


From lindahl at conservativecomputer.com  Wed Nov 28 03:17:08 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:53 2009
Subject: AMD 760 MPX ?
In-Reply-To: <200111281048.LAA10712@dylandog.crs4.it>; from scheinin@crs4.it on Wed, Nov 28, 2001 at 11:48:49AM +0100
References: <200111281048.LAA10712@dylandog.crs4.it>
Message-ID: <20011128061708.A4935@wumpus.foo>

On Wed, Nov 28, 2001 at 11:48:49AM +0100, Alan Scheinine wrote:

> Someone
> else wrote around the start of November that the 760MPX will be
> announced in mid-November.  Any news?

I saw 2 vendors at the SC2001 show in Denver with running 760MPX
machines. Neither was the final product, however.

greg


From aleahy at knox.edu  Wed Nov 28 05:14:35 2001
From: aleahy at knox.edu (Andrew Leahy)
Date: Wed Nov 25 01:01:53 2009
Subject: AMD 760 MPX ?
References: <200111281048.LAA10712@dylandog.crs4.it>
Message-ID: <3C04E33B.B53C035@knox.edu>

Alan Scheinine wrote:
> 
>    I have been trying to avoid polluting this newsgroup with
> a useless question but I cannot contain myself any longer.
> In a very nice article by Anand Lal Shimpi written on 5 June 2001,
> we can read "Don't expect too many manufacturers other than Tyan
> to have a board [with the 760MPX] until mid-late Q3 2001."  (On
> the copy I printed for myself I do not see the URL.)  Someone
> else wrote around the start of November that the 760MPX will be
> announced in mid-November.  Any news?
> best regards,
> Alan Scheinine  Email: scheinin@crs4.it
> 

There was a post about this at 2cpu.com recently (a good place for dual
processor news/rumors).  The link they point to is:

http://www.theinquirer.org/27110112.htm

But I've been reading these "they're almost here" articles for a while now,
so take it with a grain of salt.

Andrew Leahy
aleahy@knox.edu

From jared_hodge at iat.utexas.edu  Wed Nov 28 07:29:18 2001
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Wed Nov 25 01:01:53 2009
Subject: Channel Bonding Question
Message-ID: <3C0502CE.95EF747A@iat.utexas.edu>

I was wondering if it is possible to link two ethernet NICs (channel
bonding, sort of) on our server to work together talking to a single
switch.  I've lately come to realize that most work with channel bonding
requires two entirely separate networks, but what I want to do is
connect the two NICs two the switch (Cisco Catalyst) and allow it to
effectively communicate with two of the nodes at full speed at the same
time.  I guess that this would be more along the lines of line trunking
or multi-link or some other networking scheme.  If anyone knows of any
links that describe how to do this, I would appreciate it.  Thanks.
-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
Fax: 512-471-9096
Email: Jared_Hodge@iat.utexas.edu

From ak at dkp.com  Wed Nov 28 08:25:39 2001
From: ak at dkp.com (Andrew Klaassen)
Date: Wed Nov 25 01:01:53 2009
Subject: Channel Bonding Question
In-Reply-To: <3C0502CE.95EF747A@iat.utexas.edu>
References: <3C0502CE.95EF747A@iat.utexas.edu>
Message-ID: <20011128112539.D2508@dkp.com>

On Wed, Nov 28, 2001 at 09:29:18AM -0600,
Jared Hodge wrote:

> I was wondering if it is possible to link two ethernet NICs
> (channel bonding, sort of) on our server to work together
> talking to a single switch.  I've lately come to realize that
> most work with channel bonding requires two entirely separate
> networks, but what I want to do is connect the two NICs two
> the switch (Cisco Catalyst) and allow it to effectively
> communicate with two of the nodes at full speed at the same
> time.  I guess that this would be more along the lines of line
> trunking or multi-link or some other networking scheme.  If
> anyone knows of any links that describe how to do this, I
> would appreciate it.  Thanks.

No link, but here are the config files we need in order to make
this work on a Redhat box (from the
/etc/sysconfig/network-scripts directory):

---ifcfg-bond0---
DEVICE=bond0
BOOTPROTO=static
BROADCAST=192.168.0.255
IPADDR=192.168.0.181
NETMASK=255.255.255.0
NETWORK=192.168.0.0
ONBOOT=yes

---ifcfg-eth0---
DEVICE=eth0
BOOTPROTO=static
MASTER=bond0
SLAVE=yes
ONBOOT=yes

---ifcfg-eth1---
DEVICE=eth1
BOOTPROTO=static
MASTER=bond0
SLAVE=yes
ONBOOT=yes

And, in /etc/modules.conf:

alias bond0 bonding

The switch also needs to be set up for this.  We've got an HP
and a Foundry switch both doing it; one calls it "Fast
EtherChannel" (originally a Cisco term?), the other "Trunking",
and the Linux box "bonding".  Setup was pretty straightforward
once I figured out where in the switch manuals everything was...

Andrew Klaassen


From josip at icase.edu  Wed Nov 28 09:50:46 2001
From: josip at icase.edu (Josip Loncaric)
Date: Wed Nov 25 01:01:53 2009
Subject: Xbox clusters?
Message-ID: <3C0523F6.254E0EE9@icase.edu>

Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
$300.  This would make it a reasonably powerful cluster node with an
excellent price/performance ratio.  Of course, the thing runs a
slimmed-down variant of Windows 2000 instead of Linux, but has anyone
discussed making an Xbox cluster?

Sincerely,
Josip

P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan
(see http://ps2.ign.com/news/33873.html or
http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html).  Xbox
will probably not see anything similar from Microsoft.  Other game boxes
may be less powerful, but may have better prospects with Linux.  


-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip@icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

From rgb at phy.duke.edu  Wed Nov 28 10:37:58 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:53 2009
Subject: Xbox clusters?
In-Reply-To: <3C0523F6.254E0EE9@icase.edu>
Message-ID: <Pine.LNX.4.33.0111281318460.19950-100000@ganesh.phy.duke.edu>

On Wed, 28 Nov 2001, Josip Loncaric wrote:

> Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> $300.  This would make it a reasonably powerful cluster node with an
> excellent price/performance ratio.  Of course, the thing runs a
> slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> discussed making an Xbox cluster?
> 
> Sincerely,
> Josip

Dear Josip,

Case $60
Motherboard $100
Athlon XP 1500 $150
256 MB PC2100 DDR $40
100BT NIC $20
=====================
Total $370, with optional small HD and video $500-550.

Even assuming no better than direct clock speed scaling between the 1.4
GHz 1500 and the 733 MHz PIII, even ignoring the scalability and
manageability and parallel software support advantages of linux, even
ignoring the speed advantages of 256 MB of DDR over 64 MB of SDRAM, even
ignoring Amdahl's law (where one cpu at speed 2X is generally "better"
than two cpus at speed X) this still makes no economic sense, in that
aggregate 1467 MHz / 1400 MHz = 1.05 but $600/$500 = 1.2.  And you get
to run linux.  And you get the DDR.  And you get 2-3x the HD disk.  And
you don't have to run Windows or add to the greatest/worst monopoly the
world has ever seen.  And you get to choose your NIC.  And you get to
run linux.

I doubt it is worth it even for $250/node.  Perhaps $200.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From jared_hodge at iat.utexas.edu  Wed Nov 28 10:41:32 2001
From: jared_hodge at iat.utexas.edu (Jared Hodge)
Date: Wed Nov 25 01:01:53 2009
Subject: Channel Bonding Question
References: <3C0502CE.95EF747A@iat.utexas.edu> <3C052E53.34C3A9A0@obs.unige.ch>
Message-ID: <3C052FDC.4EE5BF3D@iat.utexas.edu>

I thought that might be the case, but I've heard of software (Cisco
trunking I think) that can create a virtual IP for a NIC that doesn't
exist, and when something is sent to it, the software splits it to the
two NICs and reassembles it there.  I think you have to make sure
nothing goes to the original IPs though.

Daniel Pfenniger wrote:
> 
> Jared Hodge wrote:
> >
> > I was wondering if it is possible to link two ethernet NICs (channel
> > bonding, sort of) on our server to work together talking to a single
> > switch.  I've lately come to realize that most work with channel bonding
> > requires two entirely separate networks, but what I want to do is
> > connect the two NICs two the switch (Cisco Catalyst) and allow it to
> > effectively communicate with two of the nodes at full speed at the same
> > time.  I guess that this would be more along the lines of line trunking
> > or multi-link or some other networking scheme.  If anyone knows of any
> > links that describe how to do this, I would appreciate it.  Thanks.
> 
> In that case each NIC must have one (or more) distinct IP number, so your
> applications should be able to manage that.
> 
>         Dan

-- 
Jared Hodge
Institute for Advanced Technology
The University of Texas at Austin
3925 W. Braker Lane, Suite 400
Austin, Texas 78759

Phone: 512-232-4460
Fax: 512-471-9096
Email: Jared_Hodge@iat.utexas.edu

From math at velocet.ca  Wed Nov 28 11:04:55 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
In-Reply-To: <3C0523F6.254E0EE9@icase.edu>; from josip@icase.edu on Wed, Nov 28, 2001 at 12:50:46PM -0500
References: <3C0523F6.254E0EE9@icase.edu>
Message-ID: <20011128140455.E1210@velocet.ca>

On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> $300.  This would make it a reasonably powerful cluster node with an
> excellent price/performance ratio.  Of course, the thing runs a
> slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> discussed making an Xbox cluster?

Why bother when for about $300 USD you can put together a
cluster node with a 1.333GHz athlon with 256Mb of DDR ram?

Sides, who brought 'price/performance' onto this list? Dont know thats never a
factor on the beowulf list? :)

/kc
 
> Sincerely,
> Josip
> 
> P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan
> (see http://ps2.ign.com/news/33873.html or
> http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html).  Xbox
> will probably not see anything similar from Microsoft.  Other game boxes
> may be less powerful, but may have better prospects with Linux.  
> 
> 
> -- 
> Dr. Josip Loncaric, Research Fellow               mailto:josip@icase.edu
> ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
> NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
> Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From joseph.keen at eglin.af.mil  Wed Nov 28 11:19:12 2001
From: joseph.keen at eglin.af.mil (Keen Joseph M Contr 46 SK/SKE)
Date: Wed Nov 25 01:01:54 2009
Subject: Scyld boot problem
Message-ID: <0FA55B4C91D3D411BDD4009027724DDAE4DCE9@eg-002-009.eglin.af.mil>

Greetings,

I'm looking for some help on a problem we're having with getting the demo
Scyld distribution working on
our cluster.  I'm not the admin for the cluster and will probably omit some
critical information on the
first pass so please bear with me.  The cluster configuration consists of 8
single-processor nodes and
8 dual-processor nodes.  The single-cpu nodes boot without problem.  The
dual-cpu nodes do not.
The screen information indicates a problem after the partition check at the
"end of phase 1".  The
following message appears:

Invalid session number or type of track
Kernel panic: VFS: Unable to mount root fs on 03:05
Rebooting in 30 seconds ...

This results in an continuous boot loop.  We get this same result for each
of the dual-cpu nodes.
Any ideas/suggestions?  Thanks,

Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011128/8fdf8727/attachment.html
From math at velocet.ca  Wed Nov 28 11:40:19 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
In-Reply-To: <20011128140455.E1210@velocet.ca>; from math@velocet.ca on Wed, Nov 28, 2001 at 02:04:55PM -0500
References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca>
Message-ID: <20011128144018.G1210@velocet.ca>

On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all...
> On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > $300.  This would make it a reasonably powerful cluster node with an
> > excellent price/performance ratio.  Of course, the thing runs a
> > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > discussed making an Xbox cluster?
> 
> Why bother when for about $300 USD you can put together a
> cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> 
> Sides, who brought 'price/performance' onto this list? Dont know thats never a
> factor on the beowulf list? :)

So, the question is, with these numbers, how do people end up spending
$250K on 40 or even 60-CPU clusters?

/kc

From bargle at umiacs.umd.edu  Wed Nov 28 11:42:37 2001
From: bargle at umiacs.umd.edu (Gary Jackson)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters? 
In-Reply-To: Your message of "Wed, 28 Nov 2001 14:04:55 EST."
             <20011128140455.E1210@velocet.ca> 
Message-ID: <200111281942.OAA16730@leviathan.umiacs.umd.edu>

On Wed, 28 Nov 2001, Velocet wrote:

>Why bother when for about $300 USD you can put together a
>cluster node with a 1.333GHz athlon with 256Mb of DDR ram?

Because you don't have to "pay" for assembly, or debugging the
equipment, or anything like that.  You even get a 90 day warranty.
With a self assembled beige box, it may take you 90 days to figure out
which part is broken.

-- 
					Gary Jackson
					bargle@umiacs.umd.edu

From dwu at Swales.com  Wed Nov 28 11:48:58 2001
From: dwu at Swales.com (Dominic Wu)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
In-Reply-To: <3C0523F6.254E0EE9@icase.edu>
Message-ID: <ADEKKEIIHLLBKGOJEKPHGEKOCCAA.dwu@swales.com>

It runs an XP variant and the RAM seems to be a bit on the low side.

-----Original Message-----
From: beowulf-admin@beowulf.org [mailto:beowulf-admin@beowulf.org]On
Behalf Of Josip Loncaric
Sent: Wednesday, November 28, 2001 9:51 AM
To: Beowulf mailing list
Subject: Xbox clusters?


Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
$300.  This would make it a reasonably powerful cluster node with an
excellent price/performance ratio.  Of course, the thing runs a
slimmed-down variant of Windows 2000 instead of Linux, but has anyone
discussed making an Xbox cluster?

Sincerely,
Josip

P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan
(see http://ps2.ign.com/news/33873.html or
http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html).  Xbox
will probably not see anything similar from Microsoft.  Other game boxes
may be less powerful, but may have better prospects with Linux.


--
Dr. Josip Loncaric, Research Fellow               mailto:josip@icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf


From wsb at paralleldata.com  Wed Nov 28 12:25:37 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128144018.G1210@velocet.ca>
Message-ID: <3C054841.21352D4D@paralleldata.com>

Velocet wrote:
> 
> On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all...
> > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > > $300.  This would make it a reasonably powerful cluster node with an
> > > excellent price/performance ratio.  Of course, the thing runs a
> > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > > discussed making an Xbox cluster?
> >
> > Why bother when for about $300 USD you can put together a
> > cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> >
> > Sides, who brought 'price/performance' onto this list? Dont know thats never a
> > factor on the beowulf list? :)
> 
> So, the question is, with these numbers, how do people end up spending
> $250K on 40 or even 60-CPU clusters?
> 

They buy from IBM/Compaq/HP or pick your favorite mainstream vendor.

It's unusual for a large corp to be off putting it's own PC's together.

Wes

From j.c.burton at gats-inc.com  Wed Nov 28 12:50:13 2001
From: j.c.burton at gats-inc.com (John Burton)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
References: <200111281942.OAA16730@leviathan.umiacs.umd.edu>
Message-ID: <3C054E04.83608604@gats-inc.com>

Gary Jackson wrote:

> On Wed, 28 Nov 2001, Velocet wrote:
>
> >Why bother when for about $300 USD you can put together a
> >cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
>
> Because you don't have to "pay" for assembly, or debugging the
> equipment, or anything like that.  You even get a 90 day warranty.
> With a self assembled beige box, it may take you 90 days to figure out
> which part is broken.

Ummmm....speak for yourself. I've been putting together these "self assembled beige box" for many years and
currently have about
5% component DOA rate, and about another 1% infant mortality rate (crap out within 30 days).  Takes on average 4
hours to
determine what the bad component is and 24-48 hours to replace it. I've never spent more than 1 week "figuring
out" which
part is broken. The time I spent 1 week was due to a flakey memory chip that was causing filesystem errors in a
90GB RAID 5
array.  Flakey memory is difficult to track down because it can masquerade as virtually anything else...

With the current components, you put it together and it either works or doesn't. If it doesn't you can usually
zero in on the
problem pretty quickly... buy quality components that you know work together and your job is even easier

John


From math at velocet.ca  Wed Nov 28 12:58:35 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
In-Reply-To: <20011128155517.A27265@sauerburger.nrl.navy.mil>; from stephan@sauerburger.nrl.navy.mil on Wed, Nov 28, 2001 at 03:55:17PM -0500
References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128155517.A27265@sauerburger.nrl.navy.mil>
Message-ID: <20011128155835.I1210@velocet.ca>

On Wed, Nov 28, 2001 at 03:55:17PM -0500, Stephan Sauerburger's all...
> Where at? Pricewatch? And does that include HDD?

ya you can get them for under $100USD. go check out pricewatch and
find a store you can buy the whole kit from.

(considering my designs I mighta left the case out tho. We rack our
stuff into custom cabinets).

/kc

> 
> 
> ~Stephan
> 
> 
> On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet wrote:
> > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > > $300.  This would make it a reasonably powerful cluster node with an
> > > excellent price/performance ratio.  Of course, the thing runs a
> > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > > discussed making an Xbox cluster?
> > 
> > Why bother when for about $300 USD you can put together a
> > cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> > 
> > Sides, who brought 'price/performance' onto this list? Dont know thats never a
> > factor on the beowulf list? :)
> > 
> > /kc
> >  
> > > Sincerely,
> > > Josip
> > > 
> > > P.S. Sony's PS2 Linux Beta Kit has been announced this April in Japan
> > > (see http://ps2.ign.com/news/33873.html or
> > > http://www.zdnet.com/zdnn/stories/news/0,4586,2712751,00.html).  Xbox
> > > will probably not see anything similar from Microsoft.  Other game boxes
> > > may be less powerful, but may have better prospects with Linux.  
> > > 
> > > 
> > > -- 
> > > Dr. Josip Loncaric, Research Fellow               mailto:josip@icase.edu
> > > ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
> > > NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
> > > Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf@beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > -- 
> > Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 
> > _______________________________________________
> > Beowulf mailing list, Beowulf@beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From rgb at phy.duke.edu  Wed Nov 28 13:12:57 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
In-Reply-To: <20011128144018.G1210@velocet.ca>
Message-ID: <Pine.LNX.4.33.0111281530060.19950-100000@ganesh.phy.duke.edu>

On Wed, 28 Nov 2001, Velocet wrote:

> On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all...
> > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > > $300.  This would make it a reasonably powerful cluster node with an
> > > excellent price/performance ratio.  Of course, the thing runs a
> > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > > discussed making an Xbox cluster?
> > 
> > Why bother when for about $300 USD you can put together a
> > cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> > 
> > Sides, who brought 'price/performance' onto this list? Dont know thats never a
> > factor on the beowulf list? :)
> 
> So, the question is, with these numbers, how do people end up spending
> $250K on 40 or even 60-CPU clusters?

Well, start with $300 rackmount cases (a rackmount case alone can easily
cost more than an Xbox).  Add a high end P4 motherboard, the fastest
P4-Xeon, and fully populate the MoBo with the biggest, most expensive
RDRAM sticks you can find.  Get a big, fast SCSI drive and controller.
Finish off with the fastest network you can arrange.

The high speed network alone can cost $2K/node, and one can easily
enough spend $2K on a rackmount P4 node (exclusive of the high-speed
network).

Besides, a lot of the top-end numbers are (or at any rate were)
generated by alpha/myrinet clusters, where individual nodes could easily
run $6K, with discount, NOT including the network, maybe $8K/node
including the network.  One could drop more than $500K on a 64 node
cluster without even breaking a sweat.

Note that this sort of high end cluster was (and really still is)
appropriate for moderately fine-grained parallel computations, where one
needs to spend proportionally much more for the network than usual, and
where the fastest possible processors with the fastest and biggest
memory can help control the ratio of serial code fraction to parallel
code fraction, allowing one to actually scale an application UP to 64
nodes.  

Yes, one might be able to afford 1000 AMD nodes on some agglomeration of
daisy chained switches for the same $500K (if you could afford to house
and feed them given that they would consume some 100 KW or more in
operation).  Yes, those 1000 nodes might have 2-3x the aggregate power
of the really expensive cluster for the same money.  However, if
>>your<< problem only scales to 6 nodes with that ratio of CPU speed to
network speed, the giant AMD cluster is obviously not smart.

There is a tremendous range of variation in cluster designs, with all
sorts of mixes of investment in node speed, memory speed, network
topology and speed, and while the "standard recipe" beowulfish cluster
(pile of PC's, switched 100BT, linux) is right for some (indeed, right
for me:-) it isn't right for everybody.

So Josip's question was really relevant and one that we've kicked around
on this list some before -- one day game systems may well be viable
candidates as nodes.  I don't think the Xbox is there yet.  The
new/future Sonies may be, but I'm not so certain.

The problem is:  All PC's can play games, many of them as well or better
than a dedicated gaming box.  PC's can do much more -- they are general
purpose.  The parts for a PC are all commodity and largely
interchangeable.  These factors conspire to keep PC's as powerful AND
cheap as they can reasonably be.  Game boxes nowadays have to be able to
do nearly everything a PC can do -- a motherboard with integrated
graphics, sound and network is just about a game box on a board, lacking
only an operating system and some I/O channels.  There is such a small
and narrowing window in between these two extremes that I'm not at all
convinced that there will EVER be an advantage in using game systems as
nodes.  By the time they have the features and expandability of a
PC-based node, they will necessarily reach the PC in price point or
somebody will just repackage the node and sell it as a PC (and so reach
the PC in price point).

Anyway, over the many years I've seen "thin" or "special purpose"
systems of all sorts come with much hooraw and seen them go again like
thieves in the night, with most souls sorry they ever bought them.  The
general purpose cost/benefit sweet spot is right in the middle of the PC
commodity market because market forces evolve it that way, and only
rarely does a processor based "computational" design (excluding the vast
world of controllers) come along that really can sustain a special
purpose market let alone be backportable to general purpose use.

This is the Lesson of the Wang.

(At least for those of you old enough to remember what one is...:-)

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From ak at dkp.com  Wed Nov 28 13:19:58 2001
From: ak at dkp.com (Andrew Klaassen)
Date: Wed Nov 25 01:01:54 2009
Subject: Channel Bonding Question
In-Reply-To: <20011128112539.D2508@dkp.com>
References: <3C0502CE.95EF747A@iat.utexas.edu> <20011128112539.D2508@dkp.com>
Message-ID: <20011128161956.H2508@dkp.com>

On Wed, Nov 28, 2001 at 11:25:39AM -0500,
I wrote:

> On Wed, Nov 28, 2001 at 09:29:18AM -0600,
> Jared Hodge wrote:

> > I was wondering if it is possible to link two ethernet NICs
> > (channel bonding, sort of) on our server to work together
> > talking to a single switch.  I've lately come to realize that
> > most work with channel bonding requires two entirely separate
> > networks, but what I want to do is connect the two NICs two
> > the switch (Cisco Catalyst) and allow it to effectively
> > communicate with two of the nodes at full speed at the same
> > time.  I guess that this would be more along the lines of line
> > trunking or multi-link or some other networking scheme.  If
> > anyone knows of any links that describe how to do this, I
> > would appreciate it.  Thanks.

> No link, but here are the config files we need in order to make
> this work on a Redhat box...
<snip>

Ah - I had a chance to look through the Redhat startup scripts,
and it looks like all you need on the Linux box side of things
is ifenslave.  From the ifenslave manpage:

     # modprobe bonding
     # ifconfig bond0 192.168.0.1 netmask 255.255.0.0
     # ifenslave bond0 eth0 eth1

Hope that helps.

Andrew Klaassen


From josip at icase.edu  Wed Nov 28 13:23:32 2001
From: josip at icase.edu (Josip Loncaric)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
References: <Pine.LNX.4.33.0111281318460.19950-100000@ganesh.phy.duke.edu>
Message-ID: <3C0555D4.56F3CA99@icase.edu>

"Robert G. Brown" wrote:
> 
> On Wed, 28 Nov 2001, Josip Loncaric wrote:
> 
> > has anyone discussed making an Xbox cluster?
> 
> I doubt it is worth it even for $250/node.  Perhaps $200.

You may be right.  A cluster node does not need the Xbox-style fancy
graphics, DVD drive, nor (sometimes) the hard drive, but it would need
more memory and more software flexibility.  However, the appeal of
buying compact preconfigured CPU+RAM+NIC building blocks remains...

BTW, the big monopolist selling the Xbox is supposedly losing $100 per
unit, which they won't recover from people who play "Linux cluster
games" instead of buying the usual crash-burn-maim commercial fare. 
Linux-on-Xbox idea is unlikely to get any help, even though Sony's
Linux-on-PS2 was tried in Japan.

Sincerely,
Josip


-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip@icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric@larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

From patrick at myri.com  Wed Nov 28 13:23:27 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
References: <200111281048.LAA10712@dylandog.crs4.it>
Message-ID: <3C0555CF.1C29FD19@myri.com>

Hi Alan,

Alan Scheinine wrote:

> the copy I printed for myself I do not see the URL.)  Someone
> else wrote around the start of November that the 760MPX will be
> announced in mid-November.  Any news?

We have received two machines for tests from AMD 3 weeks ago, 
and we took them to SC01. I don't know about AMD's schedule 
for official release. I can only say that we are very (VERY) 
pleased with these boxes. 

I think it will be a best choice for a lot of clusters.

Patrick

From rgb at phy.duke.edu  Wed Nov 28 13:47:36 2001
From: rgb at phy.duke.edu (Robert G. Brown)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters? 
In-Reply-To: <200111281942.OAA16730@leviathan.umiacs.umd.edu>
Message-ID: <Pine.LNX.4.33.0111281627500.19950-100000@ganesh.phy.duke.edu>

On Wed, 28 Nov 2001, Gary Jackson wrote:

> On Wed, 28 Nov 2001, Velocet wrote:
> 
> >Why bother when for about $300 USD you can put together a
> >cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> 
> Because you don't have to "pay" for assembly, or debugging the
> equipment, or anything like that.  You even get a 90 day warranty.
> With a self assembled beige box, it may take you 90 days to figure out
> which part is broken.

Surely you jest.

The systems I buy come with a lifetime labor warranty and typically have
a year parts warranty.  The vendor will assemble them for me basically
for free when I buy in bulk or maybe for $50 each if I'm buying only one
or two.  I generally buy the parts and build them myself in the latter
case to save the money.

With my trusty electric screwdriver, I can build a system out of
component parts in about 30 minutes, and so can pretty nearly anyone on
this list. Motherboard screws onto the case.  Drives screw onto rails or
into popout cages.  CPU snaps in, memory snaps in, cards snap in.  The
hardest single thing is the cabling -- gotta connect all these
itty-bitty lines from the case to the motherboard in the right places.
Power is simple.  Drive cables are simple.  Building a lego castle with
my sons is MUCH harder.  So is assembling a bicycle.

Maintenance is usually pretty simple.  The parts most likely to fail are
the drives (obvious), power supply, and the CPU/motherboard (also
obvious). When buying just one system, it does help to have a local
service department to play the swap game.  If you are buying fifty,
though, spending a few dollars more on a set of swap-em parts (or just
borrowing them from a known-good system) to determine what is wrong is
no big deal and almost never takes more than an hour or two of time.
Then, all the parts are >>cheap and readily available<< and one can
often fix the system entirely in times ranging from one hour to an
afternoon.  I'm also reasonably confident that I'll be able to fix the
system (for ever decreasing prices) through at least the first 3-5 years
of ownership before it becomes no longer worth it.

Now how, exactly, are you going to get an Xbox fixed after its 90 days
runs out?  Is it a bad CPU, dust on the CD drive, a crashed hard disk, a
bad power supply, a bad memory chip? No real OS, no diagnostics.  Nobody
this side of the factory with spare parts for at least part of what
could be wrong.  You'll end up either playing the swap game (if you are
lucky) with whatever parts inside are indeed commodity with even less to
go on than you might have with a real computer OR mailing it in for
depot repair OR throwing it away.  One round of depot repair will likely
cost half as much as the system itself -- $50/hour for labor plus parts
plus shipping both ways.  Throwing it away costs the whole system.
Fixing it yourself?

Well, which one would YOU rather fix -- a system you built yourself
designed to be expandable and easy to fix or a box deliberately
engineered to be "closed" to customers and ultimately disposable so they
can sell you more?

Just my opinion, of course...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu


From paullu at cs.ualberta.ca  Wed Nov 28 13:47:47 2001
From: paullu at cs.ualberta.ca (Paul Lu)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
In-Reply-To: <3C0555CF.1C29FD19@myri.com>; from patrick@myri.com on Wed, Nov 28, 2001 at 04:23:27PM -0500
References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com>
Message-ID: <20011128144747.V18934@cs.ualberta.ca>

Hello:

On Wed, Nov 28, 2001 at 04:23:27PM -0500, Patrick Geoffray wrote:
> We have received two machines for tests from AMD 3 weeks ago, 
> and we took them to SC01. I don't know about AMD's schedule 
> for official release. I can only say that we are very (VERY) 
> pleased with these boxes. 
> 
> I think it will be a best choice for a lot of clusters.

To the extent that you can/are allowed, would you care to comment on
how well these boards perform, especially wrt 64-bit/66 MHz Myrinet
interfaces?

We will be ordering a Myrinet-based cluster shortly and this information
would be helpful.

Thank you,

	...Paul

From lindahl at conservativecomputer.com  Wed Nov 28 14:06:49 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
In-Reply-To: <20011128144747.V18934@cs.ualberta.ca>; from paullu@cs.ualberta.ca on Wed, Nov 28, 2001 at 02:47:47PM -0700
References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca>
Message-ID: <20011128170649.A1825@wumpus.foo>

On Wed, Nov 28, 2001 at 02:47:47PM -0700, Paul Lu wrote:

> To the extent that you can/are allowed, would you care to comment on
> how well these boards perform, especially wrt 64-bit/66 MHz Myrinet
> interfaces?

Patrick's probably getting tired of saying, "We signed a nondisclosure
form."

Rest assured that the instant anyone gets an actual release version
and isn't under NDA, I'll publish the Myrinet PCI test results on my
Myrinet performance webpages.

greg


From dvos12 at calvin.edu  Wed Nov 28 14:11:09 2001
From: dvos12 at calvin.edu (David Vos)
Date: Wed Nov 25 01:01:54 2009
Subject: custom hardware (was: Xbox clusters?)
In-Reply-To: <3C054E04.83608604@gats-inc.com>
Message-ID: <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu>

On Wed, 28 Nov 2001, John Burton wrote:
> Ummmm....speak for yourself. I've been putting together these "self
> assembled beige box" for many years and currently have about 5%
> component DOA rate, and about another 1% infant mortality rate (crap
> out within 30 days).  Takes on average 4 hours to determine what the
> bad component is and 24-48 hours to replace it. I've never spent more
> than 1 week "figuring out" which part is broken. The time I spent 1
> week was due to a flakey memory chip that was causing filesystem
> errors in a 90GB RAID 5 array.  Flakey memory is difficult to track
> down because it can masquerade as virtually anything else...

There is one computer in our cluster that would make me think twice before
doing a custom build.  I prefer to call it the node from heck.  It only
has one problem: it won't boot.  If you press the power button, the
powerlight flashes while the cpu and case fans turn a quarter turn, then
nothing.  You have to wait a minute before you even get that reaction
again.  (Sounds like a short somewhere).  The problem only surfaces if the
computer has been off for a little while, and nearly every time at that.

1st Occurance (several months ago).  Try new power supply.  No go.  
Remove drives, cards, etc. from motherboard until only (new) PS(power
supply), Motherboard, Mem, and CPU.  Nope.  Swap mem.  Nope.  Swap CPU.
Nope.  Sounds like the motherboard (I replaced everything else).  I return
the original parts (and drop a screwdriver on the motherboard by
accident), and it suddenly starts working.  I put computer back in and it
runs fine with everything the way it was before.

2nd Occurance (a month or so later).  I knew it was a bad motherboard last
time, so I replaced the motherboard.  Worked great.

3rd Occurance (a month or so later).  I take things apart and put them
back together.  Starts working.  Now I'm starting to get confused.

4th Occurance (a month or so later).  I remove drives and cards, put in
spare PS.  Nothing.  Remove motherboard and put on a piece of wood with
nothing attached but spare PS, CPU, and mem (using a screw driver to short
pins instead of power switch).  Used a new power cable plugged into a
different circuit.  Nothing.  Try new mem.  Get another system and
individually check mem, motherboard, cpu.  They are all good.  Try both
PS's in other system and problem follows them.  Two bad powersupplies --
not too unusual.  I replace them, and things run great.

5th Occurance (recently).  I removed all cards, drives from
motherboard.  Nothing.  Tried spare PS.  That worked.  Unplugged current
PS from case, HD, FD, it started working.  Put everything back together
and it was still working.

Since there is not a single piece of hardware that was present in each
case, I feel forced to conclude that there must be something (power cord?)  
that is braking the power supplies.  I have not seen this problem on any
other computers.  This is the point at which I would love to put the whole
computer back in a box and send it to the reseller.

Luckily we never sent back the "bad" motherboard and keep it around as a
spare, since it works fine on other systems, now.

David


From patrick at myri.com  Wed Nov 28 14:37:32 2001
From: patrick at myri.com (Patrick Geoffray)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca>
Message-ID: <3C05672C.E77C9DBF@myri.com>

Hi Paul,

Paul Lu wrote:

> To the extent that you can/are allowed, would you care to comment on
> how well these boards perform, especially wrt 64-bit/66 MHz Myrinet
> interfaces?

Unfortunately, I cannot give this information, it's under NDA. 
I can just say it's good. It's not easy to make a good 64/66 PCI, 
and AMD made a good work.
I expect the next pre-release to be even better. 

I will send the results to Greg to publish on his web site as 
soon as the NDA is over.
I can also tell you that my next cluster will definitively 
be based on this machine.

Patrick

From math at velocet.ca  Wed Nov 28 15:19:49 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:54 2009
Subject: custom hardware (was: Xbox clusters?)
In-Reply-To: <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu>; from dvos12@calvin.edu on Wed, Nov 28, 2001 at 05:11:09PM -0500
References: <3C054E04.83608604@gats-inc.com> <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu>
Message-ID: <20011128181949.K1210@velocet.ca>

On Wed, Nov 28, 2001 at 05:11:09PM -0500, David Vos's all...
> On Wed, 28 Nov 2001, John Burton wrote:
> > Ummmm....speak for yourself. I've been putting together these "self
> > assembled beige box" for many years and currently have about 5%
> > component DOA rate, and about another 1% infant mortality rate (crap

> There is one computer in our cluster that would make me think twice before
> doing a custom build.  I prefer to call it the node from heck.  It only
> has one problem: it won't boot.  If you press the power button, the
> powerlight flashes while the cpu and case fans turn a quarter turn, then
> nothing.  You have to wait a minute before you even get that reaction
> again.  (Sounds like a short somewhere).  The problem only surfaces if the
> computer has been off for a little while, and nearly every time at that.

> Since there is not a single piece of hardware that was present in each
> case, I feel forced to conclude that there must be something (power cord?)  
> that is braking the power supplies.  I have not seen this problem on any
> other computers.  This is the point at which I would love to put the whole
> computer back in a box and send it to the reseller.

I saw this EXTREMELY SIMILAR type of situation when I went and fried
3 power supplies in a row trying to boot dual athlons on the Tiger XMP
board. :) They ran fine for 1-5 minutes then the power supply blew. Then
the power supply would never fully turn on again, just a quarter turn
of the fan kinda thing. Those were 300W supplies, and you need 350W's
(30A min on +5V, the 300s were 25A, the 350s do 32A) to run the dual board.

Now things are fine (enermax 350W supplies are nice).

So it might be that... What kinda cpu, how many drives, how much ram
and how big are your supplies?

Anyway, this kind of event DOESNT exlcuse the XBOX from having these
problems too, except you dont get to return it and you dont get to
take it apart to see which particular component piece in which combination
displays the problem.

/kc
-- 
Ken Chase, math@velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

From beerli at genetics.washington.edu  Wed Nov 28 17:03:46 2001
From: beerli at genetics.washington.edu (Peter Beerli)
Date: Wed Nov 25 01:01:54 2009
Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties
Message-ID: <Pine.LNX.4.33.0111201339500.2854-100000@darwin.genetics.washington.edu>

Hi,
I have a program developed using MPI-1 under LAM.
It runs fine on several LAM-MPI clusters with different architecture.
A user wants to run it on a Scyld-beowulf cluster and there it fails.
I did a few tests myself and it seems
that the program stalls if run on more than 3 nodes, but seems to work for
2-3 nodes. The program has master-slaves architectures where the master
is mostly doing nothing. There are some reports sent to stdout from any node
(but this seems to work in beompi the same way as in LAM). 
There are several things unclear to me
because I have no clue about the beompi system, beowulf and scyld in
particular.

(1) if I run "top" why do I see 6 processes running when I start
    with mpirun -np 3 migrate-n ? 

(2) The data-phase stalls on the slave nodes.
    The master node is reading the data from a file and then broadcasts
    a large char buffer to the slaves. Is this wrong, is there a better way
    to do that [I do not know how big the data is and it is a complex mix
    of strings numbers etc.]

void
broadcast_data_master (data_fmt * data, option_fmt * options)
{
  long bufsize;
  char *buffer;
  buffer = (char *) calloc (1, sizeof (char));
  bufsize = pack_databuffer (&buffer, data, options);
  MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
  MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
  free (buffer);
}

void
broadcast_data_worker (data_fmt * data, option_fmt * options)
{
  long bufsize;
  char *buffer;
  MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
  buffer = (char *) calloc (bufsize, sizeof (char));
  MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
  unpack_databuffer (buffer, data, options);
  free (buffer);
}

  the master and the first node seem to read the data fine
   but the others either don't and wait or silently die.
   
(3) what is the easiest way to debug this? With LAM I just attached to pids to
    in gdb on the different nodes, but here the nodes are transparent to me
    [but as I said I have never used a beowulf cluster before].


Can you give pointers, hints

thanks
Peter
-- 
Peter Beerli,  Genome Sciences, Box #357730, University of Washington,
Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754
http://evolution.genetics.washington.edu/PBhtmls/beerli.html


From daniel.pfenniger at obs.unige.ch  Thu Nov 29 00:15:15 2001
From: daniel.pfenniger at obs.unige.ch (Daniel Pfenniger)
Date: Wed Nov 25 01:01:54 2009
Subject: custom hardware (was: Xbox clusters?)
References: <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu>
Message-ID: <3C05EE93.955F54DF@obs.unige.ch>

David Vos wrote:
> 
....
> There is one computer in our cluster that would make me think twice before
> doing a custom build.  I prefer to call it the node from heck.  It only
> has one problem: it won't boot.  If you press the power button, the
> powerlight flashes while the cpu and case fans turn a quarter turn, then
> nothing.  You have to wait a minute before you even get that reaction
> again.  (Sounds like a short somewhere).  The problem only surfaces if the
> computer has been off for a little while, and nearly every time at that.

I have seen similar strange behavior of some boxes in a set of 66's, and the 
way to restart is also rather odd.  
Basically, and this has been repeatedly observed on several boxes of the same 
composition (dual Pentium III with ASUS P2BD motherboard) aligned on a metallic
shelf, the ATX box would stop after months of activity, and the simplest found 
way to restart it is to unplug everything (power and ethernet), touch it for 
a few seconds with hands, replug and voila.  No need to open the box!
My guess is that some condensator needs to be unloaded, but exactly why 
one needs to unplug every cable appears curious.

	Dan

From rauch at inf.ethz.ch  Thu Nov 29 02:42:31 2001
From: rauch at inf.ethz.ch (Felix Rauch)
Date: Wed Nov 25 01:01:54 2009
Subject: Strange hardware (was Re: custom hardware (was: Xbox clusters?))
In-Reply-To: <3C05EE93.955F54DF@obs.unige.ch>
Message-ID: <Pine.LNX.4.21.0111291134470.13874-100000@maloney.inf.ethz.ch>

On Thu, 29 Nov 2001, Daniel Pfenniger wrote:
> I have seen similar strange behavior of some boxes in a set of 66's,
> and the way to restart is also rather odd.
[...]

We recently had strange problems with a Dell-Box which has been
working without problems for several years in our small research
cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel
was unable to start the second CPU. It could see the second CPU, but
when it tried to start it up during boot, it got a timeout and so
continued with only one CPU.

So we though that one of the CPUs died and replaced both CPUs. Still
the same problem. Next we replaced the motherboard (including the
power suply). Still the same problem. Maybe the disk corrupted the
kernel, so we installed a fresh version of the same kernel onto the
box. Still the same problem. Only after physically replacing the SCSI
hard disk everything was working properly again.

We are still wondering why a disk could cause a CPU to timeout during
boot...

- Felix
-- 
Felix Rauch                      | Email: rauch@inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307


From rcferri at us.ibm.com  Thu Nov 29 04:09:20 2001
From: rcferri at us.ibm.com (Richard C Ferri)
Date: Wed Nov 25 01:01:54 2009
Subject: Marist Beowulf Setup
Message-ID: <OFD79F1ABE.20843ED9-ON85256B13.0042AA7B@pok.ibm.com>

Hi,
     Can anyone take a look at Anthony's problem below and help a poor
college student building a scyld cluster? thanks, Rich

---------------------- Forwarded by Richard C Ferri/Poughkeepsie/IBM on
11/29/2001 07:08 AM ---------------------------

Anthony Sofia <anthony@dryhump.net> on 11/28/2001 12:52:46 PM

To:   Richard C Ferri/Poughkeepsie/IBM@IBMUS
cc:   Jose.Arreola@mairst.edu
Subject:  Marist Beowulf Setup


I have a couple of problems/questions that
you might be able to help with. (This is all based on scyld)

The first problem is the beoserv and bpmater daemons are binding
to -1 instead of an address(192.168.1.1). THe nodes are able to get
their IP addresses via rarp, but when it tries to connect to
the master node(192.168.1.1:1555) to get the second level
boot image, the slave nodes stalls. When doing a netstat on the
master node, it says an established tcp connection exsists
between .-1:1555 and .0:(some port). During this, no data is
being transfered over the network, so i am sceptical if the
tcp connection actually exsists.

I am going to start looking into this, but I thought you
might have a quick answer that would make me not have to
dig through code and strace output all afternoon. =)

I think my other issues can be solved once i have this
problem fixed.

Thanks for any advice and suggestions you can give me.

Anthony Sofia
--
anthony@dryhump.net


From Mark at MarkAndrewSmith.co.uk  Thu Nov 29 05:50:58 2001
From: Mark at MarkAndrewSmith.co.uk (Mark@MarkAndrewSmith.co.uk)
Date: Wed Nov 25 01:01:54 2009
Subject: Strange hardware (was Re: custom hardware (was: Xbox clusters?))
Message-ID: <61DC272A66B8D211BA8200105ADF2D3910E71C@SERVER01>

 
Yep, seen this problem many times in our computer hire range of
Windows2000Pro machines.  The strange thing is that we only see this on Slot
1 Pentium II machines with various model motherboards.  All our Pentium III
range are socket 370 and no problems.  So we came to a feeling that the
problem was the way in which the Slot1 Pentium II sits on the motherboard.
After months of clients returning equipment to base under warranty, we
issued instruction on how to open the case and remove and re seat the
PentiumII Slot 1 processor package.  The machines then boot every time after
switch on. 
 
How many of you having this problem have it with the slot 1 Pentium II and
slot 2 Pentium III processors in your clusters?  I bet none of you have it
with a socket 370 or other "flat" socket type of CPU package.  We're
fortunate that our development cluster is based on Pentium 233MHz MMX "old"
ex-hire equipment so we don't have this problem on the cluster.  Yet! 
 
Regards, 
     Mark. 
 
-----Original Message----- 
From:		Felix Rauch [SMTP:rauch@inf.ethz.ch] 
Sent:		Thursday 29 November 2001 12:00 
To:		beowulf@beowulf.org 
Subject:	Strange hardware (was Re: custom hardware (was: Xbox
clusters?)) 
 
On Thu, 29 Nov 2001, Daniel Pfenniger wrote: 
> I have seen similar strange behavior of some boxes in a set of 66's, 
> and the way  
to restart is also rather odd. 
[...] 
 
We recently had strange problems with a Dell-Box which has been 
working without problems for  
several years in our small research 
cluster. It's a dual PII 400 MHz box, but suddenly the Linux kernel 
was unable to start the second  
CPU. It could see the second CPU, but 
when it tried to start it up during boot, it got a timeout and so 
continued with only one CPU. 
 
So  
we though that one of the CPUs died and replaced both CPUs. Still 
the same problem. Next we replaced the motherboard (including the 
power  
suply). Still the same problem. Maybe the disk corrupted the 
kernel, so we installed a fresh version of the same kernel onto the 
box.  
Still the same problem. Only after physically replacing the SCSI 
hard disk everything was working properly again. 
 
We are still wondering  
why a disk could cause a CPU to timeout during 
boot... 
 
- Felix 
--  
Felix Rauch                      | Email: rauch@inf.ethz.ch 
Institute  
for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/ 
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489 
CH  
- 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307 
 
_______________________________________________ 
Beowulf mailing list, Beowulf@beowulf.org 
To  
change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20011129/db7b8362/attachment.html
From becker at scyld.com  Thu Nov 29 06:01:50 2001
From: becker at scyld.com (Donald Becker)
Date: Wed Nov 25 01:01:54 2009
Subject: Marist Beowulf Setup
In-Reply-To: <OFD79F1ABE.20843ED9-ON85256B13.0042AA7B@pok.ibm.com>
Message-ID: <Pine.LNX.4.10.10111290840391.978-100000@vaio.greennet>

On Thu, 29 Nov 2001, Richard C Ferri wrote:

> Anthony Sofia <anthony@dryhump.net> on 11/28/2001 12:52:46 PM
>
> I have a couple of problems/questions that
> you might be able to help with. (This is all based on scyld)
> 
> The first problem is the beoserv and bpmater daemons are binding
> to -1 instead of an address(192.168.1.1).

The Scyld Beowulf system has special host names for cluster components.

.0, .1  ...   Compute (slave) nodes
.-1	      Front-end (master) nodes

Note the leading ".", which makes this a hostname instead of a number.

This hostname syntax is a valid local text hostname for library
routines.  It won't be misinterpreted as a valid Internet DNS hostname,
or an integer which would be interpreted as an IP number.

With this hostname form we can avoid the overhead or serialization of
hostname lookups by algorithmically translating to an IP address.  We
parse the number and add it to the base IP address of the cluster nodes,
usually 192.168.1.100.  (Implementation note: the correct netmask is
required for this to work with more than 154 hosts.)

> THe nodes are able to get
> their IP addresses via rarp, but when it tries to connect to
> the master node(192.168.1.1:1555) to get the second level
> boot image, the slave nodes stalls.

The leading causes of this are
  A network problem
       Switches set to forced-full-duplex won't work because there is no
         way to set driver parameters during boot
       Report the device driver version and detection message.
         The driver errata list is always changing with the introduction
	 of new, not-quite-compatible chips
  A version mismatch between the master and boot disks
      Due to a changes in the Scyld boot protocol, the boot
      floppy/CD-ROM must match the master.

> When doing a netstat on the
> master node, it says an established tcp connection exsists
> between .-1:1555 and .0:(some port). During this, no data is
> being transfered over the network, so i am sceptical if the
> tcp connection actually exsists.

Yes, netstat is accurately reporting the connection.  An established
connection indicates that at least a few packets got through.  That
reduces the likelihood of a device driver problem, but you might still
have a bogus switch configuration.  

> I am going to start looking into this, but I thought you
> might have a quick answer that would make me not have to
> dig through code and strace output all afternoon. =)

Using 'strace' likely won't be as useful as 'tcpdump'.  But just
monitoring network traffic with /proc/net/dev should give a good
indication of what is occurring.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993


From SGaudet at turbotekcomputer.com  Thu Nov 29 05:59:33 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
Message-ID: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO>

Hello,

> > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all...
> > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 
> megabytes of memory,
> > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard 
> disk for about
> > > > $300.  This would make it a reasonably powerful cluster 
> node with an
> > > > excellent price/performance ratio.  Of course, the thing runs a
> > > > slimmed-down variant of Windows 2000 instead of Linux, 
> but has anyone
> > > > discussed making an Xbox cluster?
> > >
> > > Why bother when for about $300 USD you can put together a
> > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> > >
> > > Sides, who brought 'price/performance' onto this list? 
> Dont know thats never a
> > > factor on the beowulf list? :)
> > 
> > So, the question is, with these numbers, how do people end 
> up spending
> > $250K on 40 or even 60-CPU clusters?
> > 

A low cost system can be built when using MicroATX cases with 145w ps, costs
$35.00 and up. For motherboards, I'd look at solid performers like Intel's
D815EGEWLU and S815EBM1(1u bd).

Here's the list of approved case options.

http://www.formfactors.org/searchproducts.asp#


Intel's motherbds have a 3 year warranty and don't have some flaky problems
seen on clones.

http://program.intel.com/shared/products/boards/d815egew/index.htm

http://program.intel.com/shared/products/servers/boards/S815EBM1/index.htm

The s815ebm1 is a slick motherbd, built for a 1u case and supports Tualatin,
costs about $35.00 more than the d815egewlu. The nice thing is they both
have video, fast ethernet, ATA100, etc...

 
> They buy from IBM/Compaq/HP or pick your favorite mainstream vendor.

If you find a Compaq GEM partner(we are), your fall into Government,
Educational, and Medical category, you can't beat the deals Compaq is
offering right now.  For New England they have a Evo D500, PIV 1.5Ghz, 845,
20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a
quantity they do even better on the price.

FYI: This deal might be available elsewhere, don't know.

Cheers,


Steve Gaudet 
Linux Solutions Engineer
   ..... 
  <(???)> 
 
===================================================================
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet@turbotekcomputer.com  |
|                            web: http://www.turbotekcomputer.com |
===================================================================

  
> 

From j.c.burton at gats-inc.com  Thu Nov 29 06:46:49 2001
From: j.c.burton at gats-inc.com (John Burton)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> <3C05672C.E77C9DBF@myri.com>
Message-ID: <3C064A59.F18CDAB2@gats-inc.com>

Patrick Geoffray wrote:

> Hi Paul,
>
> Paul Lu wrote:
>
> > To the extent that you can/are allowed, would you care to comment on
> > how well these boards perform, especially wrt 64-bit/66 MHz Myrinet
> > interfaces?
>
> Unfortunately, I cannot give this information, it's under NDA.
> I can just say it's good. It's not easy to make a good 64/66 PCI,
> and AMD made a good work.
> I expect the next pre-release to be even better.
>
> I will send the results to Greg to publish on his web site as
> soon as the NDA is over.
> I can also tell you that my next cluster will definitively
> be based on this machine.
>

Greetings!  I am currently in the process of upgrading an existing cluster used for course grain processing
(divide input data file into several chunks and process each chunk on seperate nodes). Each of the current nodes
is a SuperMicro 6010H (SuperMicro 370DER motherboard, serverworks HE-SL chipset) with 2GB of memory and dual 1Ghz
Pentium III processors.  I'm looking at a 1U product, the AAPRO 1124 which has a Tyan motherboard with 2GB DDR
RAM, dual Athlon MP 1800+ processors.  Networking is/will be dual 10/100 FDX NICs in a channel bonded config.
Does anyone have a feel for how the two systems compare (dual 1Ghz PIII vs dual Athlon 1800+).  Also, will the
AMD 760 MPX chipset be a significant enough improvement over the AMD 760MP to warrant waiting (how long???).  And
finally, since my supplier is a Tyan partner, its much easier to get Tyan boards - is Tyan coming out with a AMD
760 MPX based dual athlon motherboard?  Inquiring minds want to know!!!

John


From bob at drzyzgula.org  Thu Nov 29 07:02:00 2001
From: bob at drzyzgula.org (Bob Drzyzgula)
Date: Wed Nov 25 01:01:54 2009
Subject: custom hardware (was: Xbox clusters?)
In-Reply-To: <3C05EE93.955F54DF@obs.unige.ch>; from daniel.pfenniger@obs.unige.ch on Thu, Nov 29, 2001 at 09:15:15AM +0100
References: <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu> <3C05EE93.955F54DF@obs.unige.ch>
Message-ID: <20011129100200.A14075@www.snappity.org>

On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote:
> 
> David Vos wrote:
> > 
> ....
> > There is one computer in our cluster that would make me think twice before
> > doing a custom build.  I prefer to call it the node from heck.  It only
> > has one problem: it won't boot.  If you press the power button, the
> > powerlight flashes while the cpu and case fans turn a quarter turn, then
> > nothing.  You have to wait a minute before you even get that reaction
> > again.  (Sounds like a short somewhere).  The problem only surfaces if the
> > computer has been off for a little while, and nearly every time at that.
> 
> I have seen similar strange behavior of some boxes in a set of 66's, and the 
> way to restart is also rather odd.  
> Basically, and this has been repeatedly observed on several boxes of the same 
> composition (dual Pentium III with ASUS P2BD motherboard) aligned on a metallic
> shelf, the ATX box would stop after months of activity, and the simplest found 
> way to restart it is to unplug everything (power and ethernet), touch it for 
> a few seconds with hands, replug and voila.  No need to open the box!
> My guess is that some condensator needs to be unloaded, but exactly why 
> one needs to unplug every cable appears curious.

One thing to understand is that, unless there is a physical
switch on the power supply itself, ATX systems are never
*really* turned off as long as they are plugged in -- they
only go to a "standby" state, wherein +5V power is still
being applied to a single pin (the purple wire). When you
press the power button on the front of the chassis, it
merely shorts a header that ultimately causes the
motherboard to short the green wire in the ATX cable to
ground -- this is a signal to the power supply to leave
standby and start generating power for all the other
outputs.

Another thing to observe is that generally, ATX power
supplies are switching supplies, which means that (to
simplify things somewhat) they generate the correct voltage
by charging and discharging a capacitor at a high rate. The
switching controller constantly monitors the voltage on the
capacitor and connects or disconnects the capacitor to the
incoming supply, depending on whether the charge is above or
below the desired level (the detailed truth behind this is
fairly complex and typically involves multiple stages and
inductors as well as capacitors, but this model is probably
good enough for this discussion...). Thus, even when an ATX
system is "off", the power supply is chugging along, keeping
a capacitor charged to provide +5V at a low current. BTW, if
you have the resources to do this, put a current sensor on
the incoming AC line for a running system and feed the
output to an oscilloscope.  You should see a series of
alternating positive and negative spikes -- those are the
capacitors charging at the peaks and troughs of the AC
voltage.

Now, if the ATX board were simply to run the green-wire
contact straight through to the power on/off header, you
wouldn't need much oomph at all on the +5V standby line, and
older ATX power supplies in fact didn't. However, newer
boards have things like Wake-on-LAN, Wake-on-Modem, and
other various and sundry goodies that have to run off the
+5V standby.  It has gotten to the point that, in order to
do all the processing that is required to leave standby, the
standby current draw is greater than what some older
supplies can provide. So in the case of a power supply that
either by design or fault cannot provide sufficient current
under standby, what (I think) happens is that while the
motherboard is waiting for the main supply voltages to come
up to full power, the standby processing bleeds off the
capacitor to the point that the standby voltage sags below
the minimum required for operation. At that point, the
standby processing halts, the motherboard stops holding the
green wire to ground, and the power supply stops trying to
power up. It then returns to standby mode, re-charges the
standby capacitor, and the cycle begins again.

If you have a system that is behaving like this, try putting
a voltmeter on the standby pin of the ATX header (you can
usually jab a probe down into the back of the connector).
You should see it at +5V when the system is "off". Then
press the system's "on" button and watch the voltage. You'll
most likely see it sag down to a couple of volts or so.  If
this doesn't happen, you've probably got some other problem,
perhaps a POST failure of some sort. Also, this may not be
the end of the diagnosis -- it is possible that the failure
to provide enough current on standby may not be the fault of
the power supply itself. It could be a faulty componant
(e.g. the SCSI drive we heard about) sucking down too much
current on power-up, or an overburdened AC supply circuit
that sags just a bit when your system starts up -- in the
latter case I imagine that you could wind up with a
seemingly jinxed spot in the equipment rack. :-)

BTW, if the power supply has too little oomph on standby by
*design*, the system will probably *never* power up.  If the
supply's design meets the new spec only marginally, or if it
is malfunctioning, say, because of a damaged or weakened
capacitor, then it might behave differently when cold than
it does when it is fully warmed up. In this event,
unplugging the supply for a while and reconnecting it can
create a short window in which the supply can get the system
over the hump to leave standby. I in fact have a supply at
home that has this problem, and I just sort of live with it
because it's not my main system. Someday perhaps I'll
replace the supply.

As to why you have to disconnect the Ethernet as well, I
really don't have a clue.

HTH,
--Bob Drzyzgula

From lindahl at conservativecomputer.com  Thu Nov 29 07:24:55 2001
From: lindahl at conservativecomputer.com (Greg Lindahl)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
In-Reply-To: <3C064A59.F18CDAB2@gats-inc.com>; from j.c.burton@gats-inc.com on Thu, Nov 29, 2001 at 09:46:49AM -0500
References: <200111281048.LAA10712@dylandog.crs4.it> <3C0555CF.1C29FD19@myri.com> <20011128144747.V18934@cs.ualberta.ca> <3C05672C.E77C9DBF@myri.com> <3C064A59.F18CDAB2@gats-inc.com>
Message-ID: <20011129102455.A3390@wumpus.foo>

On Thu, Nov 29, 2001 at 09:46:49AM -0500, John Burton wrote:

> Also, will the AMD 760 MPX chipset be a significant enough
> improvement over the AMD 760MP to warrant waiting (how long???).

The main improvement of the MPX chipset over the MP is better PCI
bandwidth. Given that you are only using bonded fast Ethernet, you
won't notice a difference. The reason people care about PCI bandwidth
are things like Myrinet and SCSI/IDE RAID, which need a lot of
bandwidth.

greg

From alvin at iplink.net  Thu Nov 29 08:06:58 2001
From: alvin at iplink.net (alvin)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
References: <Pine.LNX.4.33.0111281530060.19950-100000@ganesh.phy.duke.edu>
Message-ID: <3C065D22.C554211B@iplink.net>

"Robert G. Brown" wrote:
> 
> On Wed, 28 Nov 2001, Velocet wrote:
> 
> > On Wed, Nov 28, 2001 at 02:04:55PM -0500, Velocet's all...
> > > On Wed, Nov 28, 2001 at 12:50:46PM -0500, Josip Loncaric's all...
> > > > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > > > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > > > $300.  This would make it a reasonably powerful cluster node with an
> > > > excellent price/performance ratio.  Of course, the thing runs a
> > > > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > > > discussed making an Xbox cluster?
> > >
> > > Why bother when for about $300 USD you can put together a
> > > cluster node with a 1.333GHz athlon with 256Mb of DDR ram?
> > >
> > > Sides, who brought 'price/performance' onto this list? Dont know thats never a
> > > factor on the beowulf list? :)
> >
> > So, the question is, with these numbers, how do people end up spending
> > $250K on 40 or even 60-CPU clusters?
> 
> Well, start with $300 rackmount cases (a rackmount case alone can easily
> cost more than an Xbox).  Add a high end P4 motherboard, the fastest
> P4-Xeon, and fully populate the MoBo with the biggest, most expensive
> RDRAM sticks you can find.  Get a big, fast SCSI drive and controller.
> Finish off with the fastest network you can arrange.

[snip]

> 
> This is the Lesson of the Wang.
> 
> (At least for those of you old enough to remember what one is...:-)

To put on my humrous hat.
I understand that MS is loosing cash on each Xbox they sell. Possibly
they are looking to do somthing like Kodak did in it early days where
they sold cameras much cheaper the it cost to produce them so that they
could make it up on the film. Well with the exception that MS want to
sell software.

Possibly we should everybody should go out an buy an Xbox. If we buy
enough then we may be able to put the EVIL EMPIRE out of business.

And If we all install Linux on the Xboxes then MS will lose out on the
ongoing SW sales.


-- 
Alvin Starr                   ||   voice: (416)785-4051
Interlink Connectivity        ||   fax:   (416)785-3668
alvin@iplink.net              ||

From jlong at arsc.edu  Thu Nov 29 09:43:52 2001
From: jlong at arsc.edu (James Long)
Date: Wed Nov 25 01:01:54 2009
Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties
In-Reply-To: 
 <Pine.LNX.4.33.0111201339500.2854-100000@darwin.genetics.washington.edu>
References: 
 <Pine.LNX.4.33.0111201339500.2854-100000@darwin.genetics.washington.edu>
Message-ID: <p04330100b82c220fa7c5@[199.165.84.194]>

The buffer allocation is only one byte in broadcast_data_master. 
Looks like you should make it big enough for all your data and 
options before you broadcast it, as there is no telling what might 
stomp that memory after you pack it and before it gets sent.

Jim

At 5:03 PM -0800 11/28/01, Peter Beerli wrote:
>Hi,
>I have a program developed using MPI-1 under LAM.
>It runs fine on several LAM-MPI clusters with different architecture.
>A user wants to run it on a Scyld-beowulf cluster and there it fails.
>I did a few tests myself and it seems
>that the program stalls if run on more than 3 nodes, but seems to work for
>2-3 nodes. The program has master-slaves architectures where the master
>is mostly doing nothing. There are some reports sent to stdout from any node
>(but this seems to work in beompi the same way as in LAM).
>There are several things unclear to me
>because I have no clue about the beompi system, beowulf and scyld in
>particular.
>
>(1) if I run "top" why do I see 6 processes running when I start
>     with mpirun -np 3 migrate-n ?
>
>(2) The data-phase stalls on the slave nodes.
>     The master node is reading the data from a file and then broadcasts
>     a large char buffer to the slaves. Is this wrong, is there a better way
>     to do that [I do not know how big the data is and it is a complex mix
>     of strings numbers etc.]
>
>void
>broadcast_data_master (data_fmt * data, option_fmt * options)
>{
>   long bufsize;
>   char *buffer;
>   buffer = (char *) calloc (1, sizeof (char));
>   bufsize = pack_databuffer (&buffer, data, options);
>   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
>   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
>   free (buffer);
>}
>
>void
>broadcast_data_worker (data_fmt * data, option_fmt * options)
>{
>   long bufsize;
>   char *buffer;
>   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
>   buffer = (char *) calloc (bufsize, sizeof (char));
>   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
>   unpack_databuffer (buffer, data, options);
>   free (buffer);
>}
>
>   the master and the first node seem to read the data fine
>    but the others either don't and wait or silently die.
>   
>(3) what is the easiest way to debug this? With LAM I just attached to pids to
>     in gdb on the different nodes, but here the nodes are transparent to me
>     [but as I said I have never used a beowulf cluster before].
>
>
>Can you give pointers, hints
>
>thanks
>Peter
>--
>Peter Beerli,  Genome Sciences, Box #357730, University of Washington,
>Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754
>http://evolution.genetics.washington.edu/PBhtmls/beerli.html
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf

-- 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
James Long
MPP Specialist
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks, AK 99775-6020
jlong@arsc.edu
(907) 474-5731 work
(907) 474-5494 fax
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

From jonathan at meanwhile.freeserve.co.uk  Thu Nov 29 10:27:04 2001
From: jonathan at meanwhile.freeserve.co.uk (Jonathan Coupe)
Date: Wed Nov 25 01:01:54 2009
Subject: Re. XBox clusters
Message-ID: <001a01c17903$7294c620$2901893e@baby>

----- Original Message -----
From: "Josip Loncaric" <josip@icase.edu>
To: "Beowulf mailing list" <beowulf@beowulf.org>
Sent: Wednesday, November 28, 2001 5:50 PM
Subject: Xbox clusters?


> Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> $300.  This would make it a reasonably powerful cluster node with an
> excellent price/performance ratio.  Of course, the thing runs a
> slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> discussed making an Xbox cluster?
>
> Sincerely,
> Josip
>

I remember people speculating in a similar way re. the Dreamcast. (I did.)
In practice I doubt that a game console will ever be a better bet for
clustering than a PC. Firstly, most of the transistor budget goes into the
3D card, where it's efectively useless for us. Secondly, PC's track the
price of cpu's, etc, much more quickly than consoles. If a console was
*really* heavily subsidised by its maker - consoles usually are subsidised
at luanch time - it could start cheaper than the PC. But in a few months it
would have lost this price advantage.

- Jonathan Coupe


From joelja at darkwing.uoregon.edu  Thu Nov 29 10:33:54 2001
From: joelja at darkwing.uoregon.edu (Joel Jaeggli)
Date: Wed Nov 25 01:01:54 2009
Subject: Re. XBox clusters
In-Reply-To: <001a01c17903$7294c620$2901893e@baby>
Message-ID: <Pine.LNX.4.33.0111291030141.26186-100000@twin.uoregon.edu>

I'd also note that in my latest pc-connection catalog... 1.1ghz celerons 
with 128MB of ram and 20GB drives and nics from compaq are $499. Myself I 
prefer to build them rather than use srinkwrap pc's but passable boxes 
with warranties are out there...

joelja

On Thu, 29 Nov 2001, Jonathan Coupe wrote:

> ----- Original Message -----
> From: "Josip Loncaric" <josip@icase.edu>
> To: "Beowulf mailing list" <beowulf@beowulf.org>
> Sent: Wednesday, November 28, 2001 5:50 PM
> Subject: Xbox clusters?
> 
> 
> > Microsoft's Xbox packages a 733 MHz Pentium III, 64 megabytes of memory,
> > a DVD drive, 100 Mbps Ethernet, and an 8-gigabyte hard disk for about
> > $300.  This would make it a reasonably powerful cluster node with an
> > excellent price/performance ratio.  Of course, the thing runs a
> > slimmed-down variant of Windows 2000 instead of Linux, but has anyone
> > discussed making an Xbox cluster?
> >
> > Sincerely,
> > Josip
> >
> 
> I remember people speculating in a similar way re. the Dreamcast. (I did.)
> In practice I doubt that a game console will ever be a better bet for
> clustering than a PC. Firstly, most of the transistor budget goes into the
> 3D card, where it's efectively useless for us. Secondly, PC's track the
> price of cpu's, etc, much more quickly than consoles. If a console was
> *really* heavily subsidised by its maker - consoles usually are subsidised
> at luanch time - it could start cheaper than the PC. But in a few months it
> would have lost this price advantage.
> 
> - Jonathan Coupe
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli				       joelja@darkwing.uoregon.edu    
Academic User Services			     consult@gladstone.uoregon.edu
     PGP Key Fingerprint: 1DE9 8FCA 51FB 4195 B42A 9C32 A30D 121E
--------------------------------------------------------------------------
It is clear that the arm of criticism cannot replace the criticism of
arms.  Karl Marx -- Introduction to the critique of Hegel's Philosophy of
the right, 1843.


From SGaudet at turbotekcomputer.com  Thu Nov 29 10:39:02 2001
From: SGaudet at turbotekcomputer.com (Steve Gaudet)
Date: Wed Nov 25 01:01:54 2009
Subject: AMD 760 MPX ?
Message-ID: <3450CC8673CFD411A24700105A618BD6170FB0@911TURBO>

Hello,

> Greetings!  I am currently in the process of upgrading an 
> existing cluster used for course grain processing
> (divide input data file into several chunks and process each 
> chunk on seperate nodes). Each of the current nodes
> is a SuperMicro 6010H (SuperMicro 370DER motherboard, 
> serverworks HE-SL chipset) with 2GB of memory and dual 1Ghz
> Pentium III processors.  I'm looking at a 1U product, the 
> AAPRO 1124 which has a Tyan motherboard with 2GB DDR
> RAM, dual Athlon MP 1800+ processors.  Networking is/will be 
> dual 10/100 FDX NICs in a channel bonded config.
> Does anyone have a feel for how the two systems compare (dual 
> 1Ghz PIII vs dual Athlon 1800+).  Also, will the
> AMD 760 MPX chipset be a significant enough improvement over 
> the AMD 760MP to warrant waiting (how long???).  And
> finally, since my supplier is a Tyan partner, its much easier 
> to get Tyan boards - is Tyan coming out with a AMD
> 760 MPX based dual athlon motherboard?  Inquiring minds want 
> to know!!!

Just in from Tyan.  The 2466 (Tiger) will be available for sampling 
starting next week.  The 2468 (Thunder) will be available most likely 
the beginning of January. 

Cheers,


Steve Gaudet 
Linux Solutions Engineer
   ..... 
  <(???)> 
 
===================================================================
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 8025 South Willow St.      fax:603-666-4519                     |
| Building 2, Unit 105       toll free:800-573-5393               |
| Manchester, NH 03103       e-mail:sgaudet@turbotekcomputer.com  |
|                            web: http://www.turbotekcomputer.com |
===================================================================

  
From jharrop at shaw.ca  Thu Nov 29 11:20:58 2001
From: jharrop at shaw.ca (J Harrop)
Date: Wed Nov 25 01:01:54 2009
Subject: custom hardware (was: Xbox clusters?)
In-Reply-To: <20011129100200.A14075@www.snappity.org>
References: <"from daniel.pfenniger"@obs.unige.ch>
 <Pine.GSO.4.21.0111281651350.18682-100000@udu.calvin.edu>
 <3C05EE93.955F54DF@obs.unige.ch>
Message-ID: <5.0.2.1.0.20011129110501.009f2a30@shawmail>

We have had similar problems over the years, some of which we tracked down 
to poor grounding conditions in the building wiring.  I know one location 
where the weather (in particular rain) can affect the behavior of some of 
the system.  I expect the grounding problem would create problems with 
similar symptoms on the newer power supplies - but I cant give a detailed 
explanation such as the excellent one posted.  I seem to recall that we 
also had this problem with the older power supplies.  Solution was the same 
- unplug, wait, reboot.

My favorite hardware problem was when I was working down in Honduras.  One 
of the laptops became more and more flaky and finally quit booting at 
all.  When I swapped out the CD-ROM module to try and boot from a floppy I 
found a stray ant sitting on the inside edge of the connector!  On further 
inspection the inside of the laptop turned out to be packed with them.  I 
wanted to duct-tape the machine closed and mail the box back to Dell with a 
"bug report" taped on it ;-)

John Harrop

At 10:02 AM 29/11/2001 -0500, you wrote:
>On Thu, Nov 29, 2001 at 09:15:15AM +0100, Daniel Pfenniger wrote:
> >
> > David Vos wrote:
> > >
> > ....
> > > There is one computer in our cluster that would make me think twice 
> before
> > > doing a custom build.  I prefer to call it the node from heck.  It only
> > > has one problem: it won't boot.  If you press the power button, the
> > > powerlight flashes while the cpu and case fans turn a quarter turn, then
> > > nothing.  You have to wait a minute before you even get that reaction
> > > again.  (Sounds like a short somewhere).  The problem only surfaces 
> if the
> > > computer has been off for a little while, and nearly every time at that.
> >
> > I have seen similar strange behavior of some boxes in a set of 66's, 
> and the
> > way to restart is also rather odd.
> > Basically, and this has been repeatedly observed on several boxes of 
> the same
> > composition (dual Pentium III with ASUS P2BD motherboard) aligned on a 
> metallic
> > shelf, the ATX box would stop after months of activity, and the 
> simplest found
> > way to restart it is to unplug everything (power and ethernet), touch 
> it for
> > a few seconds with hands, replug and voila.  No need to open the box!
> > My guess is that some condensator needs to be unloaded, but exactly why
> > one needs to unplug every cable appears curious.
>
>One thing to understand is that, unless there is a physical
>switch on the power supply itself, ATX systems are never
>*really* turned off as long as they are plugged in -- they
>only go to a "standby" state, wherein +5V power is still
>being applied to a single pin (the purple wire). When you
>press the power button on the front of the chassis, it
>merely shorts a header that ultimately causes the
>motherboard to short the green wire in the ATX cable to
>ground -- this is a signal to the power supply to leave
>standby and start generating power for all the other
>outputs.
>
>Another thing to observe is that generally, ATX power
>supplies are switching supplies, which means that (to
>simplify things somewhat) they generate the correct voltage
>by charging and discharging a capacitor at a high rate. The
>switching controller constantly monitors the voltage on the
>capacitor and connects or disconnects the capacitor to the
>incoming supply, depending on whether the charge is above or
>below the desired level (the detailed truth behind this is
>fairly complex and typically involves multiple stages and
>inductors as well as capacitors, but this model is probably
>good enough for this discussion...). Thus, even when an ATX
>system is "off", the power supply is chugging along, keeping
>a capacitor charged to provide +5V at a low current. BTW, if
>you have the resources to do this, put a current sensor on
>the incoming AC line for a running system and feed the
>output to an oscilloscope.  You should see a series of
>alternating positive and negative spikes -- those are the
>capacitors charging at the peaks and troughs of the AC
>voltage.
>
>Now, if the ATX board were simply to run the green-wire
>contact straight through to the power on/off header, you
>wouldn't need much oomph at all on the +5V standby line, and
>older ATX power supplies in fact didn't. However, newer
>boards have things like Wake-on-LAN, Wake-on-Modem, and
>other various and sundry goodies that have to run off the
>+5V standby.  It has gotten to the point that, in order to
>do all the processing that is required to leave standby, the
>standby current draw is greater than what some older
>supplies can provide. So in the case of a power supply that
>either by design or fault cannot provide sufficient current
>under standby, what (I think) happens is that while the
>motherboard is waiting for the main supply voltages to come
>up to full power, the standby processing bleeds off the
>capacitor to the point that the standby voltage sags below
>the minimum required for operation. At that point, the
>standby processing halts, the motherboard stops holding the
>green wire to ground, and the power supply stops trying to
>power up. It then returns to standby mode, re-charges the
>standby capacitor, and the cycle begins again.
>
>If you have a system that is behaving like this, try putting
>a voltmeter on the standby pin of the ATX header (you can
>usually jab a probe down into the back of the connector).
>You should see it at +5V when the system is "off". Then
>press the system's "on" button and watch the voltage. You'll
>most likely see it sag down to a couple of volts or so.  If
>this doesn't happen, you've probably got some other problem,
>perhaps a POST failure of some sort. Also, this may not be
>the end of the diagnosis -- it is possible that the failure
>to provide enough current on standby may not be the fault of
>the power supply itself. It could be a faulty componant
>(e.g. the SCSI drive we heard about) sucking down too much
>current on power-up, or an overburdened AC supply circuit
>that sags just a bit when your system starts up -- in the
>latter case I imagine that you could wind up with a
>seemingly jinxed spot in the equipment rack. :-)
>
>BTW, if the power supply has too little oomph on standby by
>*design*, the system will probably *never* power up.  If the
>supply's design meets the new spec only marginally, or if it
>is malfunctioning, say, because of a damaged or weakened
>capacitor, then it might behave differently when cold than
>it does when it is fully warmed up. In this event,
>unplugging the supply for a while and reconnecting it can
>create a short window in which the supply can get the system
>over the hump to leave standby. I in fact have a supply at
>home that has this problem, and I just sort of live with it
>because it's not my main system. Someday perhaps I'll
>replace the supply.
>
>As to why you have to disconnect the Ethernet as well, I
>really don't have a clue.
>
>HTH,
>--Bob Drzyzgula
>_______________________________________________
>Beowulf mailing list, Beowulf@beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


From beerli at genetics.washington.edu  Thu Nov 29 13:19:33 2001
From: beerli at genetics.washington.edu (Peter Beerli)
Date: Wed Nov 25 01:01:54 2009
Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties
In-Reply-To: <p04330100b82c220fa7c5@[199.165.84.194]>
Message-ID: <Pine.LNX.4.33.0111291316230.885-100000@darwin.genetics.washington.edu>

Jim,
the buffer in broadcast_data_master gets allocated to the size needed
in pack_data_buffer() [which returns the allocated size of buffer]
before the buffer is broadcasted.

Peter


On Thu, 29 Nov 2001, James Long wrote:

> The buffer allocation is only one byte in broadcast_data_master. 
> Looks like you should make it big enough for all your data and 
> options before you broadcast it, as there is no telling what might 
> stomp that memory after you pack it and before it gets sent.
> 
> Jim
> 
> At 5:03 PM -0800 11/28/01, Peter Beerli wrote:
> >Hi,
> >I have a program developed using MPI-1 under LAM.
> >It runs fine on several LAM-MPI clusters with different architecture.
> >A user wants to run it on a Scyld-beowulf cluster and there it fails.
> >I did a few tests myself and it seems
> >that the program stalls if run on more than 3 nodes, but seems to work for
> >2-3 nodes. The program has master-slaves architectures where the master
> >is mostly doing nothing. There are some reports sent to stdout from any node
> >(but this seems to work in beompi the same way as in LAM).
> >There are several things unclear to me
> >because I have no clue about the beompi system, beowulf and scyld in
> >particular.
> >
> >(1) if I run "top" why do I see 6 processes running when I start
> >     with mpirun -np 3 migrate-n ?
> >
> >(2) The data-phase stalls on the slave nodes.
> >     The master node is reading the data from a file and then broadcasts
> >     a large char buffer to the slaves. Is this wrong, is there a better way
> >     to do that [I do not know how big the data is and it is a complex mix
> >     of strings numbers etc.]
> >
> >void
> >broadcast_data_master (data_fmt * data, option_fmt * options)
> >{
> >   long bufsize;
> >   char *buffer;
> >   buffer = (char *) calloc (1, sizeof (char));
> >   bufsize = pack_databuffer (&buffer, data, options);
> >   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
> >   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
> >   free (buffer);
> >}
> >
> >void
> >broadcast_data_worker (data_fmt * data, option_fmt * options)
> >{
> >   long bufsize;
> >   char *buffer;
> >   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
> >   buffer = (char *) calloc (bufsize, sizeof (char));
> >   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
> >   unpack_databuffer (buffer, data, options);
> >   free (buffer);
> >}
> >
> >   the master and the first node seem to read the data fine
> >    but the others either don't and wait or silently die.
> >   
> >(3) what is the easiest way to debug this? With LAM I just attached to pids to
> >     in gdb on the different nodes, but here the nodes are transparent to me
> >     [but as I said I have never used a beowulf cluster before].
> >
> >
> >Can you give pointers, hints
> >
> >thanks
> >Peter
> >--
> >Peter Beerli,  Genome Sciences, Box #357730, University of Washington,
> >Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754
> >http://evolution.genetics.washington.edu/PBhtmls/beerli.html
> >
> >
> >
> >_______________________________________________
> >Beowulf mailing list, Beowulf@beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit 
> >http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

-- 
Peter Beerli,  Genome Sciences, Box #357730, University of Washington,
Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754
http://evolution.genetics.washington.edu/PBhtmls/beerli.html


From wsb at paralleldata.com  Thu Nov 29 13:45:31 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:54 2009
Subject: Xbox clusters?
References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO>
Message-ID: <3C06AC7B.77FFC84A@paralleldata.com>

Steve Gaudet wrote:
> 
> 
> > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor.
> 
> If you find a Compaq GEM partner(we are), your fall into Government,
> Educational, and Medical category, you can't beat the deals Compaq is
> offering right now.  For New England they have a Evo D500, PIV 1.5Ghz, 845,
> 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a
> quantity they do even better on the price.
> 

Wonder why medical? That's big business.

I'm in business to make money with clusters so I guess I wouldn't qualify
for that program. However, I can build an equivalent node for less than
$500. (Skipping the CD and win2k which I have no use for)

d845wnl   $130
P4 1.5ghz $152
Case/PS    $30
20GB disk  $63
256MB dimm $30
AGP card   $20
==============
total     $425

Shipping would be around $35 delivered to your door. All you need is
a screw driver to assemble...

The d845wnl has 10/100 built in and is PXE bootable.

If you like P4 1.9Ghz systems, add $120 and you have a screaming
node for $545. (if you like P4's for your codes)

It's amazing how cheap nodes are now.


Wes

From agrajag at scyld.com  Thu Nov 29 15:46:10 2001
From: agrajag at scyld.com (Sean Dilda)
Date: Wed Nov 25 01:01:55 2009
Subject: mpi-prog porting from lam -> scyld beowulf mpi difficulties
In-Reply-To: <Pine.LNX.4.33.0111201339500.2854-100000@darwin.genetics.washington.edu>; from beerli@genetics.washington.edu on Wed, Nov 28, 2001 at 05:03:46PM -0800
References: <Pine.LNX.4.33.0111201339500.2854-100000@darwin.genetics.washington.edu>
Message-ID: <20011129184610.C17892@blueraja.scyld.com>

On Wed, 28 Nov 2001, Peter Beerli wrote:

> (1) if I run "top" why do I see 6 processes running when I start
>     with mpirun -np 3 migrate-n ? 

Two per node.  For every process your want running, it also runs another
one to take care of the MPI network I/O.   Our MPI is based off of
mpich, and this is how they have it setup.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://www.scyld.com/pipermail/beowulf/attachments/20011129/ae2ddf85/attachment.bin
From bill at math.ucdavis.edu  Thu Nov 29 20:34:02 2001
From: bill at math.ucdavis.edu (Bill Broadley)
Date: Wed Nov 25 01:01:55 2009
Subject: MPI I/O + nfs
Message-ID: <20011129203402.A26613@sphere.math.ucdavis.edu>

I'm trying to get MPICH-1.2.2.3 MPI I/O + nfs working.

I read:
http://www-unix.mcs.anl.gov/mpi/mpich/docs/install/node31.htm 

Step 1:
~/private/io> /usr/sbin/rpcinfo  -p `hostname` | grep nfs 
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs

I'm using clients n1 and n2:
n2:~> mount | grep noac
master:/d0 on /d0 type nfs (rw,nfsvers=3,noac,addr=192.168.0.250)
n1:~> mount | grep noac
master:/d0 on /d0 type nfs (rw,nfsvers=3,noac,addr=192.168.0.250)

Just to make absolutely sure I'm using nfs 3 I ran nfstats, I ran
on n1 and n2 (same result):
Client nfs v2:
null       getattr    setattr    root       lookup     readlink   
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
read       wrcache    write      create     remove     rename     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
link       symlink    mkdir      rmdir      readdir    fsstat     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 

Client nfs v3:
null       getattr    setattr    lookup     access     readlink   
0       0% 222540 54% 83      0% 10010   2% 52      0% 53      0% 
read       write      create     mkdir      symlink    mknod      
67772  16% 103571 25% 2070    0% 2       0% 0       0% 0       0% 
remove     rmdir      rename     link       readdir    readdirplus
2068    0% 2       0% 0       0% 0       0% 172     0% 0       0% 
fsstat     fsinfo     pathconf   commit     
356     0% 356     0% 0       0% 1372    0% 

When running a very simple MPI I/O example I stil get:

File locking failed in ADIOI_Set_lock. If the file system is NFS, you
need to use NFS version 3 and mount the directory with the 'noac' option
(no attribute caching).

Anyone have any ideas?  Anyone know of an MPICH mailing list?

Additional info:
n1:~> uname -a 
Linux n1 2.4.9 #5 SMP Wed Sep 26 19:59:17 GMT-7 2001 i686 unknown
n2:~> uname -a
Linux n2 2.4.9 #5 SMP Wed Sep 26 19:59:17 GMT-7 2001 i686 unknown


-- 
Bill Broadley
Mathematics/Institute of Theoretical Dynamics
UC Davis

From TIMOTHY.R.WAIT at saic.com  Wed Nov 28 12:07:47 2001
From: TIMOTHY.R.WAIT at saic.com (Tim Wait)
Date: Wed Nov 25 01:01:55 2009
Subject: Xbox clusters?
References: <3C0523F6.254E0EE9@icase.edu> <20011128140455.E1210@velocet.ca> <20011128144018.G1210@velocet.ca>
Message-ID: <3C054413.3030004@apo.saic.com>

> So, the question is, with these numbers, how do people end up spending
> $250K on 40 or even 60-CPU clusters?
> 

Um, high speed interconnect at $1500/box, quality components,
 >=512 MB per proc, rackmounts, big h/w raid storage, A/C...

tim


From schweng at master2.astro.unibas.ch  Thu Nov 29 05:24:30 2001
From: schweng at master2.astro.unibas.ch (Hans Schwengeler)
Date: Wed Nov 25 01:01:55 2009
Subject: Portland High Performance Fortran pghpf on Scyld cluster
Message-ID: <200111291324.OAA07606@master2.astro.unibas.ch>

Hello,

	I want to use pghpf on our new Scyld cluster (b27-8). pgf77 and pgf90
work ok, but pghpf appears to hang during execution of the resulting
program.
First trial was to point /usr/local/mpi/lib to /usr/lib/, second try
was building mpich-1.2.1 (from the Scyld ftp site after applying the patches).
Both have the result that f77 and f90 work, but NOT pghpf.
I also tried the advice from the pgi FAQ and replaced mpi.o in
/usr/local/pgi/linux86/lib/libpghpf_mpi.a but to no avail.
Test program is /home/schweng/util/mpich-1.2.1-6.6.beo/mpich-1.2.1/installtest/pi3.f.
/usr/local/bin/mpirun -np 2 pi3
 Process             0  of             2  is alive
Enter the number of intervals: (0 quits)
<-- here it hangs, i.e. Process 1 comes never to live.


Yours, Hans Schwengeler.

From matz at wsunix.wsu.edu  Fri Nov 30 10:59:14 2001
From: matz at wsunix.wsu.edu (Phillip D. Matz)
Date: Wed Nov 25 01:01:55 2009
Subject: time command defaults changed in RedHat 7.2 vs RedHat 6.2?
Message-ID: <003201c179d1$1aefe660$b4297986@chem.wsu.edu>

I am used to keeping track of the actual time (elapsed) a job takes to
complete on my cluster with the command line option "time" in RedHat 6.2.

Recently I reinstalled RedHat 7.2 and now the "time" command yields
different results (as if the portable option "-p" is always on).

The man pages only help to tell me why the output looks the way it does, but
doesn't tell me how to change the default back to what it looks like in a
6.2 installation.

Does anyone know which file I need to modify to make the time command report
the total elapsed time and not have the output be in the portable format?

Thanks!

Phil Matz


From rlatham at plogic.com  Fri Nov 30 11:22:49 2001
From: rlatham at plogic.com (Rob Latham)
Date: Wed Nov 25 01:01:55 2009
Subject: MPI I/O + nfs
In-Reply-To: <20011129203402.A26613@sphere.math.ucdavis.edu>; from bill@math.ucdavis.edu on Thu, Nov 29, 2001 at 08:34:02PM -0800
References: <20011129203402.A26613@sphere.math.ucdavis.edu>
Message-ID: <20011130142249.K10306@otto.plogic.internal>

On Thu, Nov 29, 2001 at 08:34:02PM -0800, Bill Broadley wrote:
> 
> I'm trying to get MPICH-1.2.2.3 MPI I/O + nfs working.

If you want ROMIO ( MPI I/O ), i strongly suggest using pvfs as the
"back end" for your file system.   In the few cases i know of where a
customer used nfs as the back end, performance was downright poor ( as
should be expected when you have to turn off all the caching ).

start here: http://parlweb.parl.clemson.edu/pvfs/index.html

==rob

-- 
[ Rob Latham <rlatham@plogic.com>         Developer, Admin, Alchemist ]
[ Paralogic Inc. - www.plogic.com                                     ]
[                                                                     ]
[ EAE8 DE90 85BB 526F 3181                   1FCF 51C4 B6CB 08CC 0897 ]

From lmeerkat at yahoo.com  Fri Nov 30 09:31:07 2001
From: lmeerkat at yahoo.com (L. Gritsenko)
Date: Wed Nov 25 01:01:55 2009
Subject: Scyld boot problem 
Message-ID: <20011130173107.49716.qmail@web20609.mail.yahoo.com>

Maybe this will be helpful:
http://www.beowulf.org/pipermail/beowulf/2001-August/001057.html

=====


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

From math at velocet.ca  Fri Nov 30 14:37:53 2001
From: math at velocet.ca (Velocet)
Date: Wed Nov 25 01:01:55 2009
Subject: Xbox clusters?
In-Reply-To: <3C06AC7B.77FFC84A@paralleldata.com>; from wsb@paralleldata.com on Thu, Nov 29, 2001 at 03:45:31PM -0600
References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> <3C06AC7B.77FFC84A@paralleldata.com>
Message-ID: <20011130173753.B1210@velocet.ca>

On Thu, Nov 29, 2001 at 03:45:31PM -0600, W Bauske's all...
> Steve Gaudet wrote:
> > 
> > 
> > > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor.
> > 
> > If you find a Compaq GEM partner(we are), your fall into Government,
> > Educational, and Medical category, you can't beat the deals Compaq is
> > offering right now.  For New England they have a Evo D500, PIV 1.5Ghz, 845,
> > 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a
> > quantity they do even better on the price.
> > 
> 
> Wonder why medical? That's big business.
> 
> I'm in business to make money with clusters so I guess I wouldn't qualify
> for that program. However, I can build an equivalent node for less than
> $500. (Skipping the CD and win2k which I have no use for)
> 
> d845wnl   $130
> P4 1.5ghz $152
> Case/PS    $30
> 20GB disk  $63
> 256MB dimm $30
> AGP card   $20
> ==============
> total     $425
> 
> Shipping would be around $35 delivered to your door. All you need is
> a screw driver to assemble...
> 
> The d845wnl has 10/100 built in and is PXE bootable.

Any athlon boards with new chipsets that are PXE bootable?

The PcChips M817 MLR has that, but its not a great board, and old chipset.

/kc

From wsb at paralleldata.com  Fri Nov 30 15:47:14 2001
From: wsb at paralleldata.com (W Bauske)
Date: Wed Nov 25 01:01:55 2009
Subject: Xbox clusters?
References: <3450CC8673CFD411A24700105A618BD6170FA5@911TURBO> <3C06AC7B.77FFC84A@paralleldata.com> <20011130173753.B1210@velocet.ca>
Message-ID: <3C081A82.F1B4436B@paralleldata.com>

I PXE boot my tiger MP's (s2460) with Intel pro/100 pci adapters.
Adapters go for about $27 which I thought was fair to allow
me to boot/install without a floppy or CD. The floppy and CD
combined are more than that typically.

The boards I've used that have built-in Enet for Athlon have
used some sort of Netware boot capability which I know nothing
about. (K7S5A I think)

Wes

Velocet wrote:
> 
> On Thu, Nov 29, 2001 at 03:45:31PM -0600, W Bauske's all...
> > Steve Gaudet wrote:
> > >
> > >
> > > > They buy from IBM/Compaq/HP or pick your favorite mainstream vendor.
> > >
> > > If you find a Compaq GEM partner(we are), your fall into Government,
> > > Educational, and Medical category, you can't beat the deals Compaq is
> > > offering right now.  For New England they have a Evo D500, PIV 1.5Ghz, 845,
> > > 20Gb, 256mb, WIN2000, CD for $667.00 up to December 12th. Moreover if its a
> > > quantity they do even better on the price.
> > >
> >
> > Wonder why medical? That's big business.
> >
> > I'm in business to make money with clusters so I guess I wouldn't qualify
> > for that program. However, I can build an equivalent node for less than
> > $500. (Skipping the CD and win2k which I have no use for)
> >
> > d845wnl   $130
> > P4 1.5ghz $152
> > Case/PS    $30
> > 20GB disk  $63
> > 256MB dimm $30
> > AGP card   $20
> > ==============
> > total     $425
> >
> > Shipping would be around $35 delivered to your door. All you need is
> > a screw driver to assemble...
> >
> > The d845wnl has 10/100 built in and is PXE bootable.
> 
> Any athlon boards with new chipsets that are PXE bootable?
> 
> The PcChips M817 MLR has that, but its not a great board, and old chipset.
> 
> /kc
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

From ron_chen_123 at yahoo.com  Fri Nov 30 19:41:47 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:55 2009
Subject: MPI I/O + nfs
In-Reply-To: <20011129203402.A26613@sphere.math.ucdavis.edu>
Message-ID: <20011201034147.26940.qmail@web14703.mail.yahoo.com>

There is no MPICH mailing-list. You can email the
MPICH developers directly.

On the other hand, you may check the LAM MPI mailing
list, may be they have encountered similar problems
before:

http://www.lam-mpi.org/mailman/listinfo.cgi/lam-announce

 -Ron


--- Bill Broadley <bill@math.ucdavis.edu> wrote:
> Anyone have any ideas?  Anyone know of an MPICH
> mailing list?


__________________________________________________
Do You Yahoo!?
Buy the perfect holiday gifts at Yahoo! Shopping.
http://shopping.yahoo.com

From ron_chen_123 at yahoo.com  Fri Nov 30 19:55:22 2001
From: ron_chen_123 at yahoo.com (Ron Chen)
Date: Wed Nov 25 01:01:55 2009
Subject: GCC/Fortran 90/95 questions
Message-ID: <20011201035522.85826.qmail@web14706.mail.yahoo.com>

> 2) Does gcc support f90 or f95?  If not is there any

> GNU compiler that does, are any expected to be in 
> the future?

There is a compiler called open64, which is SGI's
compiler for IA64. They have a C front-end, which is
based on gcc, and they have another for f90. (I don't
know the details...)

Recently, they have ported the f90 front-end and
run-time to other compiler back-ends. Please read the
note below for details.

http://open64.sourceforge.net/

http://sourceforge.net/tracker/?group_id=34861&atid=413342

 -Ron

===========================================================
Porting open64 F90 front-end to Solaris 
This patch ports the open64 Fortran90 compiler front 
end to sparc_solaris platform. Specifically, it ports 
these three executable programs: "mfef90", "ir_tools",

and "whirl2f". ANY OTHER COMPONENT OF OPEN64 IS NOT IN

THE SCOPE OF THIS PATCH.

Tested platforms include  sparc_solaris, mips_irix and
ia32_linux, using both GNU  gcc and vendor compiler.
Makefiles, some header files  and some c/c++ source
files were modified for porting.  


__________________________________________________
Do You Yahoo!?
Buy the perfect holiday gifts at Yahoo! Shopping.
http://shopping.yahoo.com