PowerPC assembly language beginners guide.

Chapter 6

In this, and the next few chapters, we will be writing a larger Mac application in PPC assembly language. We will be making deliberate mistakes to highlight some easily made errors and problems.

Answers to questions posed in Chapter 5

In the last chapter I left you with some questions. The one people most found baffling was how to optimize the code that changes the three background colour components. Here is the original unoptimized code:

**Change the bg color
	lwz	r3,my_bg_color(rtoc)
	lhz	r4,(r3)	*get red
	addi	r4,r4,220	*add to it
	sth	r4,(r3)	*store red
	lhz	r4,2(r3)	*get green
	subi	r4,r4,280
	sth	r4,2(r3)	*Store green
	lhz	r4,4(r3)	*get blue
	subi	r4,r4,180
	sth	r4,4(r3)	*Store blue
	Xcall	RGBBackColor	r3	

The answer is given by Fantasm's Stall Warning Generator. If we switch it on for the red component calculation, we will get dependency warnings for r4 at the addi instruction. You can do this by modifying the code to read:

	swg_med		*switch stall warning generator to medium sensitivity
	lhz	r4,(r3)	*get red
	addi	r4,r4,220	*add to it (SWG Warning here)
	sth	r4,(r3)	*store red
	swg_off		*swg off

Why is the SWG giving a warning? Because if the processor can't get at the contents of r3 (which is pointing to my_bg_color) on this clock cycle and put it in r4 as an unsigned 16 bit value (lhz) there will be a delay before the add can process because the add needs the contents of r3 as one of it's operands. i.e. the processor will stall.

If we bear in mind that the PPC can issue memory requests and then get on with other things, then we can prevent this stall by moving the add further down the instruction stream, so even if there is a delay getting the red component of the colour, by the time we need to data for the add it should be available.

Fine. So how do we move the processing (the add) further down the line? Does this work?

	lhz	r4,(r3)	*get red
	addi	r4,r4,220	*add to it (SWG Warning here)

Nope. All that does is introduce a three clock cycle delay which doesn't achieve anything (apart from wasting three clock cycles). What we need is useful processing inbetween the load and the calculation. How about we issue other memory requests whilst we're waiting for this data to arrive? Can we do that? It would be cool if we could. Well, is the PPC a powerful chip? Course it is. Have a look at this:

**Change the bg color
	lwz	r3,my_bg_color(rtoc)	*The colour we are changing

**get the red,green and blue components
	lhz	r4,(r3)	*get red
	lhz	r5,2(r3)	*get green
	lhz	r6,4(r3)	*get blue

**now the processing
	addi	r4,r4,220	*add to it
	sth	r4,(r3)	*store red
	subi	r5,r5,280
	sth	r5,2(r3)	*Store green
	subi	r6,r6,180
	sth	r6,4(r3)	*Store blue
	Xcall	RGBBackColor	*r3 points to the color	

By rearranging the code and using two more registers we have eliminated any stalls that may occur if the data isn't immediately available. The highly astute among you may question a possible stall at the Xcall instruction because we are storing r6 in 4(r3) immediately before the call.

There will be no stall here. Why?

Two reasons. Firstly, the reading from the background color in the preceding code will have ensured the data is in the level 1 cache, so there will be no delay writing it. Secondly, Xcall will run through at least four instructions before the data is needed (would be five in compiled code because functions are normally called via a branch and link (bl), whereas Fantasm does it in-line and saves two instructions per OS call).

Just for your information, the possible setting for the SWG are as follows:

Please note that the SWG is inoperative in demo versions of Fantasm as the demos are distributed as 68K builds only. Due to the heavy workload required for the SWG to operate (it has to emulate the processor) it is coded in PPC assm.

The other question involved how to prevent abrupt colour changes. Well the answer to this one was simply avoid the dramatic change that occurs when a colour component switches from a large value (i.e. 0xffff) to a small value (i.e. 0) or the other way round. The last question was how to speed up the colour changing without having to fill the whole rectangle. The answer was to use Color Look Up Tables (CLUT's) in 256 color mode - an example of this is given on the Fant5 CD in the example "Clut Fade" so I won't reproduce it here.

Other matters
We need to point out that MacsBug versions prior to 6.5.4aX will simply not work on MacOS version 8.00 or later.

Macintosh Applications

The question most frequently asked by beginners is "How does a Macintosh Application fit together? How does it run?".

In common with most Graphical User Interface (GUI) operating systems, the Mac runs via "events". Every time the user does something such as clicking the mouse button or pressing a key, the OS generates an "event" and places it in the event queue. If the application in the foreground - the one the user is interacting with, examines this event queue it can find out what the user is doing by processing the events as they appear in the queue. It is a mistake to say the OS "sends" an event to the application - it doesn't, it simply places the events in the queue. It is up to the application to get the events out of the queue. There are lots of different events that can be placed in the queue - Apple Events for example, but luckily there are just a few that "really matter".

These, with the possible exception of the diskette insertion events, are the major events you need to handle in order that your application will run and behave under the MacOS.  

The mouse and keyboard events should hopefully be self explanatory - you need to know when the mouse is clicked or held down and what keys the user is pressing.

The update requests are put in the queue when the OS thinks one of your windows needs to be redrawn (for example because another window has moved, or closed).

The activate event is placed in the queue when the OS thinks you need to activate (or deactivate) a window because the user has clicked in another window or application. Activating a window generally means showing any controls (scrollbars for example) associated with the window, activating any highlighting visible, activating the caret (if you have one) etc. Deactivating is the opposite - hiding everything (unlike Windows 95 which thinks it's fine to have scrollbars active on windows in the background!).

Whenever you pull an event out of the queue, the format of the event "record" always follows this structure:

 Size in bytes Name Offset Description




Type of event (0=null)




Depends on event type




Ticks since system startup (60ths of a second)




Mouse position in global coordinates (y/x)




State of Apple, option, ctrl etc keys

Thus we can see that an event record is 16 bytes in size.

Event types (the what) are declared as follows:

Thus we can draw up a plan for a standard Mac application:

1. Initialise.
2. Process events.
3. Quit.

Simple huh?


How do we get the events out of the queue? We use an OS function called WaitNextEvent. WaitNextEvent takes four parameters and returns a Boolean value. The boolean return value is true if there is an event for us or false if no event. In this case the type of event will be null.

The high level definition of WaitNextEvent is


eventMask is a 16 bit value specifying which events you are interested in - pass -1 to get all events.

theEvent is a pointer to a 16 byte eventrecord.

sleep is the maximum number of ticks you application agrees to let the OS (other applications) have control for (cooperative multitasking).

mouseRgn specifies a region inside of which mouse movement does not cause mouse moved events. Pass null if you don't want any mouse moved events.

If you remember back to chapter 5 we talked about calling OS functions - we could do it three different ways - the hard way for masochists, coding it all by hand, the easier way by using Xcall and the easy and error proof way by using OScalls. In chapter 5 we used Xcall in our code. Now we will switch to using the OS calls found in the file Universal_OS_calls_plus.def. This new file is available from the Updates area. It isn't included in the demo distribution or on Fant5 CD's prior to 1st May 98. This is Fantasms definition of WaitNextEvent (taken out of Universal_OS_calls_plus.def):

Using this method we can call WaitNextEvent as:

OSWaitNextEvent r3,r4,r5,r6,r3

This means we place the four arguments in r3-r6 and expect the boolean return value (either 0 or not 0) in r3. To be able to do this we need to make Universal_OS_calls.def a Globinc - this means add the file to our project and then move it into the Global Includes or _Globincs area of the project window (See below).

You may note that the definition of OSWaitNextEvent is a macro. Macros are a way of replacing one instruction (in this case OSWaitNextEvent) with many instructions. Macros can call other macros (in this case, the WaitNextEvent macro calls map_in_4 and map_out - these are other macros) and can take parameters, referenced as \1, \2, \3 etc. For more on macros please consult your documentation.


The Project

That gives us all the theory we need to get on with writing a Mac App. First, lets create the project. Run Anvil, and then from the Project Menu select "Create New Project". A dialog box like the one below will open (we've switched on the item labels by clicking the little help icon). From the Project template menu select PPC Fantasm App and change the file name to whatever you want - in our case we called it "Chapter 6".


Now click the OK button which will bring up a file selector box. Select (or make) a folder to create this project in and then save. A new project window will open in Anvil. Because we will be using Universal_OS_calls_plus.def we need to add it to the project and make it a globinc - this means Universal_OS_calls_plus.def will be included in every one of our source files automatically. To do this, click on the little disk icon at the top of the project window - looks like . From the file selector navigate to the folder called "Anvil Low Level Defs" and select Universal_OS_Calls_Plus.def. Anvil will place the file in the _Src area of the project. You need to drag it into the _Globincs area so the project window looks like this:

Now we can start writing our program. From Anvil's File menu select New. This will create a new source file in memory and open a window. If you haven't set Anvils default language to PPC (via the General Preferences option) then you will need to set this files language to PPC. Save the file as "Main.s". We will try to write this program as modular as possible, so our main file is going to call three routines: Init, events, terminate. This will develop into a reasonably sized program, so structure is all important here. It will seem as if we are generating a lot of files, with very little in them. This true because we are architecting a large project by dummying pieces of code. This is a good practice to get into.

For now we know we need to call at least three routines, so as well as our main file, we will also generate three other files; Init.s, Events.s and Term.s.


Our main file needs to look like this:

From that we can see the programs top level structure - it initialises, handles events and then quits. Notice that:
a). All branches and labels are colored blue by Anvil (as a default - you can change the colors).

b). We have declared main as being global and it's also the entry point for the program - this is where it starts running.

c). We have declared init,events and terminate as extern. This means these items are external to this file. They can be found in other files.

Now we can create the other files - even though we have no idea what code will go in them yet, we can still create them and define some structure - init.s for example looks like this:

Do the same for Events.s and Term.s

Now we can add these files to our project. If you were "canny" you may have created a folder for your source code - this is a good idea. We add the files to the project the same way we added Universal_OS_Calls_Plus.def - by clicking the little disk icon in the project window. You should end up with a project looking like this:  

And then if you build the project it'll look like this (DO NOT RUN IT AFTER BUILDING! There is no terminate routine):  

Now it's time to start writing some code. First we need to initialise the Macintosh as all good Mac apps must. Open the file Init.s and change it to call init_mac. This is a library routine that does all the initialising for us and saves some typing. There is a problem in that to call something, we normally use the link register (LR), and the link register is currently holding our return address to main! So we need to save it somewhere. How to save the Link Register? Here are a some solutions...

1. We have 32 integer registers available, so we'll save it in a register rather than in memory - lets say that r27,28 and 29 will be link register save areas. We need to copy the link register into r27. We do this with a mflr instruction which means Move From Link Register, and we can restore it with a mtlr instruction - I wont insult your intelligence with the expansion of that mnemonic.
The problem with that is you have to keep track of how many subroutines you have called, to be sure of saving the LR in the right register. Fast but difficult to maintain possibly.

2. The easiest way by far of saving the link register as you go into some code, and restoring it for returning is to use the macros sub_in and out. These work by saving the link register and restoring it for you. To be able to do this, you need the file LS_PPC_Macros.def as a Globinc in your project. These macros require three instructions to either save or restore the link register. You can use these routines anytime.

3. You can push it onto the stack, and then pop it off when needed - this takes two instructions for each push and pop to the link register - first you need to move the link register into a General Purpose Register (GPR) then you need to push the register onto the stack (and the other way round to pop off the stack). To use this method you need to use Universal_OS_Calls_Plus.def.

4. If a routine doesn't call any other code, then there's no need to save the LR!

We will use the second method - it's slower, but as this is for beginners, we want to keep it simple. If you are quite happy with the other methods, then do it as it's faster!

To use sub_in and sub_out we need to make LS_PPC_Macro.def a globinc - this file contains lots of useful little macros to make PPC assembly language easier. Browse through it. Follow the procedure we used for Universal_OS_calls.def above.

The whole point of that discussion was to highlight that in assembly language, there are no rules. You can do things as you see fit and that you are comfortable with. Maintaining good structure however is paramount, irrespective of the language.

So, back to our initialisation. We have made LS_PPC_Macros.def a Globinc, and we've modified Init.s to call init_mac. Init.s should now look like this:

If we now build this project, we will get errors from the linker. It'll say Init.s wants to link to init_mac but the reference doesn't exist. The linker is complaining because it can't find the code for init_mac.

init_mac is a library function, in this case held in the library called P_Application_Library. For the linker to find the function we need to add the library to our project. After adding the library, if you want to find out what functions it contains, just double click it out of the project window. Anvil will ask the Librarian to open the library and display it's contents. To get information about a function, click the function name in the Librarian's window.

A common mistake is to mis-spell a library functions' name (labels in PPC are case sensitive). Suppose we had added P_Application_Library to our project but the code read:

bl Init_Mac

This would fail the same way as if the library was not present. The Linker would complain.

"Init_Mac" is NOT the same as "init_mac" in PPC because of case sensitivity.


So, we're initialised. It's easy to spot an App which has forgotten to call init_mac - it crashes a lot. Normally when you try to open a window, or call QuickDraw - it'll crash almost instantly. The last thing we need to do in this section is to be able to terminate correctly. The OS provides a function called ExitToShell. This will immediately terminate your application and is the recommended way to quit. We need to modify terminate.s to call ExitToShell as follows:


All it will do is boot up and then quit, but, and this is the important bit, we have defined the structure of a useful project. One we can add pieces of code to, within a defined structure.

Now, lets take some time out to think about a typical Mac App. What are the elements of the interface?

The main elements include menus, windows and dialog boxes along with more minor details such as the mouse cursor and maybe even sound. Now what services does a typical Mac App require? Filing system services are probably way up on the agenda - the ability to read and save files is important. We need to look at all these things.

You may notice not once have we specified what this app does; for the reasons of this tutorial it is irrelevant.

In the next part we will expand our application to be able to handle simple events with specific emphasis on how to run the menus. We will look at initialising and dynamically changing menus to suit the current context along with acting on menu selections.

Postscript: Some are worried this series is going the wrong way - concentrating too much on the Mac OS and application world, rather than specific low level techniques such as writing to video memory etc. Please do not worry, we will come to that. The information is already out there in the example applications we provide with Fantasm. In the mean time if you do have any specific questions about this area, please direct them to lightsoft@zedworld.demon.co.uk

Copyright Lightsoft 1998.

Reproduction in whole or part prohibited without permission.

Download the project for this chapter