Chapter xx Math & Algorithms ---------------------------------------------------------------------------- Contents Re: long integer square root Re: long integer square root Re: long integer square root Re: Fixed Math sqrt, asin, acos Re: Area of a traingle formula wanted (not the easy one) Re: Area of a traingle Re: Point in Polygon routine needed Re: card shuffling algorithm Re: Reversing the bytes in a long... Re: Reversing the bytes in a long... void comparePStrings(); C inline swapping functions Re: Preloading regions? Is that a good idea? Re: Bit Manipulation help needed. Re: faster division routine wanted. Re: faster division routine wanted. Re: (Q) Need bit manipulation algorithm Re: faster division routine wanted. Re: faster division routine wanted. Re: (Q) Need bit manipulation algorithm Re: (Q) Need bit manipulation Re: (Q) Need bit manipulation algorithm Julia's Dream algorithm + example code Detecting double click? Re: (Q) Need bit manipulation algorithm Re: (Q) Need bit manipulation algorithm Re: (Q) Need bit manipulation algorithm Re: How do you draw a circle/oval? Re: 64-bit multiply and divide on 68000 Re: looking for a REALLY RaNdOM "RANDOM" Re: Easter was: Re: Algorithm for Day-Of-Week NEEDED Re: Easter Sunday algorythm fastQSort.p (Pascal port of fastQsort.c) Re: Algorithm for Day-Of-Week NEEDED Re: Better random number than Random()? Re: Algorithm for Day-Of-Week NEEDED Re: Better random number than Random()? Re: Algorithm for Day-Of-Week NEEDED Re: Algorithm for Day-Of-Week NEEDED Re: Algorithm for Day-Of-Week NEEDED Re: Looking for fast sorting code for pascal Pascal ---------------------------------------------------------------------------- From:jnhall@sat.mot.com (Joseph Hall) Subject: Re: long integer square root In article 0; i >>= 2, k0 <<= 1) ; nn <<= 2; for (;;) { k1 = (nn / k0 + k0) >> 1; if (((k0 ^ k1) & ~1) == 0) break; k0 = k1; } return (int) ((k1 + 1) >> 1); } ---------------------------------------------------------------------------- From: jimc@tau-ceti.isc-br.com (Jim Cathey) Subject: Re: long integer square root In article <220o6u$cn@email.tuwien.ac.at> estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes: >Does somebody have code or an algorithm for extracting a long integer >square root and returning an integer? There was a long series of letters in Dr. Dobbs' Journal a few years back that I was a part of. Here are two 'competing' subroutines for the 68000 that I wrote. One is a Newton-Raphson iterator, the other a hybrid of three different subroutines using two different methods, heavily optimized for speed. The non-Newton one is faster in some cases and slower in others. Choosing one as 'best' depends on what kind of input ranges you expect to see most often. Newton's time doesn't vary much on the average based on the input argument. The non-Newton one ranges from a lot better (small arguments) to a little worse (large arguments), but is always better than Newton's worst case. Don't neglect the fact that on a FPU-equipped machine it may be faster to use floating point. I've done no research into this possibility, nor have I timed this stuff on any 68020+ system. The speed balance is no doubt different then. (For that matter, the 68000 testbed had no wait states nor interrupt system overhead, which doesn't necessarily apply to a 68000 PowerBook, and certainly doesn't apply to a Mac+ or earlier.) I'm personally fond of the non-Newton version, because the algorithm only uses shifts and adds, so it could be implemented in microcode with about same speed as a divide. ************************************************************************* * * * Integer Square Root (32 to 16 bit). * * * * (Newton-Raphson method). * * * * Call with: * * D0.L = Unsigned number. * * * * Returns: * * D0.L = SQRT(D0.L) * * * * Notes: Result fits in D0.W, but is valid in longword. * * Takes from 338 cycles (1 shift, 1 division) to * * 1580 cycles (16 shifts, 4 divisions) (including rts). * * Averages 854 cycles measured over first 65535 roots. * * Averages 992 cycles measured over first 500000 roots. * * * ************************************************************************* .globl lsqrt * Cycles lsqrt movem.l d1-d2,-(sp) (24) move.l d0,d1 (4) ; Set up for guessing algorithm. beq.s return (10/8) ; Don't process zero. moveq #1,d2 (4) guess cmp.l d2,d1 (6) ; Get a guess that is guaranteed to be bls.s newton (10/8) ; too high, but not by much, by dividing the add.l d2,d2 (8) ; argument by two and multiplying a 1 by 2 lsr.l #1,d1 (10) ; until the power of two passes the modified bra.s guess (10) ; argument, then average these two numbers. newton add.l d1,d2 (8) ; Average the two guesses. lsr.l #1,d2 (10) move.l d0,d1 (4) ; Generate the next approximation(s) divu d2,d1 (140) ; via the Newton-Raphson method. bvs.s done (10/8) ; Handle out-of-range input (cheats!) cmp.w d1,d2 (4) ; Have we converged? bls.s done (10/8) swap d1 (4) ; No, kill the remainder so the clr.w d1 (4) ; next average comes out right. swap d1 (4) bra.s newton (10) done clr.w d0 (4) ; Return a word answer in a longword. swap d0 (4) move.w d2,d0 (4) return movem.l (sp)+,d1-d2 (28) rts (16) end ************************************************************************* * * * Integer Square Root (32 to 16 bit). * * * * (Exact method, not approximate). * * * * Call with: * * D0.L = Unsigned number. * * * * Returns: * * D0.L = SQRT(D0.L) * * * * Notes: Result fits in D0.W, but is valid in longword. * * Takes from 122 to 1272 cycles (including rts). * * Averages 610 cycles measured over first 65535 roots. * * Averages 1104 cycles measured over first 500000 roots. * * * ************************************************************************* .globl lsqrt * Cycles lsqrt tst.l d0 (4) ; Skip doing zero. beq.s done (10/8) cmp.l #$10000,d0 (14) ; If is a longword, use the long routine. bhs.s glsqrt (10/8) cmp.w #625,d0 (8) ; Would the short word routine be quicker? bhi.s gsqrt (10/8) ; No, use general purpose word routine. * ; Otherwise fall into special routine. * * For speed, we use three exit points. * This is cheesy, but this is a speed-optimized subroutine! page ************************************************************************* * * * Faster Integer Square Root (16 to 8 bit). For small arguments. * * * * (Exact method, not approximate). * * * * Call with: * * D0.W = Unsigned number. * * * * Returns: * * D0.W = SQRT(D0.W) * * * * Notes: Result fits in D0.B, but is valid in word. * * Takes from 72 (d0=1) to 504 (d0=625) cycles * * (including rts). * * * * Algorithm supplied by Motorola. * * * ************************************************************************* * Use the theorem that a perfect square is the sum of the first * sqrt(arg) number of odd integers. * Cycles move.w d1,-(sp) (8) move.w #-1,d1 (8) qsqrt1 addq.w #2,d1 (4) sub.w d1,d0 (4) bpl qsqrt1 (10/8) asr.w #1,d1 (8) move.w d1,d0 (4) move.w (sp)+,d1 (12) done rts (16) page ************************************************************************* * * * Integer Square Root (16 to 8 bit). * * * * (Exact method, not approximate). * * * * Call with: * * D0.W = Unsigned number. * * * * Returns: * * D0.L = SQRT(D0.W) * * * * Uses: D1-D4 as temporaries -- * * D1 = Error term; * * D2 = Running estimate; * * D3 = High bracket; * * D4 = Loop counter. * * * * Notes: Result fits in D0.B, but is valid in word. * * * * Takes from 544 to 624 cycles (including rts). * * * * Instruction times for branch-type instructions * * listed as (X/Y) are for (taken/not taken). * * * ************************************************************************* * Cycles gsqrt movem.l d1-d4,-(sp) (40) move.w #7,d4 (8) ; Loop count (bits-1 of result). clr.w d1 (4) ; Error term in D1. clr.w d2 (4) sqrt1 add.w d0,d0 (4) ; Get 2 leading bits a time and add addx.w d1,d1 (4) ; into Error term for interpolation. add.w d0,d0 (4) ; (Classical method, easy in binary). addx.w d1,d1 (4) add.w d2,d2 (4) ; Running estimate * 2. move.w d2,d3 (4) add.w d3,d3 (4) cmp.w d3,d1 (4) bls.s sqrt2 (10/8) ; New Error term > 2* Running estimate? addq.w #1,d2 (4) ; Yes, we want a '1' bit then. addq.w #1,d3 (4) ; Fix up new Error term. sub.w d3,d1 (4) sqrt2 dbra d4,sqrt1 (10/14) ; Do all 8 bit-pairs. move.w d2,d0 (4) movem.l (sp)+,d1-d4 (44) rts (16) page ************************************************************************* * * * Integer Square Root (32 to 16 bit). * * * * (Exact method, not approximate). * * * * Call with: * * D0.L = Unsigned number. * * * * Returns: * * D0.L = SQRT(D0.L) * * * * Uses: D1-D4 as temporaries -- * * D1 = Error term; * * D2 = Running estimate; * * D3 = High bracket; * * D4 = Loop counter. * * * * Notes: Result fits in D0.W, but is valid in longword. * * * * Takes from 1080 to 1236 cycles (including rts.) * * * * Two of the 16 passes are unrolled from the loop so that * * quicker instructions may be used where there is no * * danger of overflow (in the early passes). * * * * Instruction times for branch-type instructions * * listed as (X/Y) are for (taken/not taken). * * * ************************************************************************* * Cycles glsqrt movem.l d1-d4,-(sp) (40) moveq #13,d4 (4) ; Loop count (bits-1 of result). moveq #0,d1 (4) ; Error term in D1. moveq #0,d2 (4) lsqrt1 add.l d0,d0 (8) ; Get 2 leading bits a time and add addx.w d1,d1 (4) ; into Error term for interpolation. add.l d0,d0 (8) ; (Classical method, easy in binary). addx.w d1,d1 (4) add.w d2,d2 (4) ; Running estimate * 2. move.w d2,d3 (4) add.w d3,d3 (4) cmp.w d3,d1 (4) bls.s lsqrt2 (10/8) ; New Error term > 2* Running estimate? addq.w #1,d2 (4) ; Yes, we want a '1' bit then. addq.w #1,d3 (4) ; Fix up new Error term. sub.w d3,d1 (4) lsqrt2 dbra d4,lsqrt1 (10/14) ; Do first 14 bit-pairs. add.l d0,d0 (8) ; Do 15-th bit-pair. addx.w d1,d1 (4) add.l d0,d0 (8) addx.l d1,d1 (8) add.w d2,d2 (4) move.l d2,d3 (4) add.w d3,d3 (4) cmp.l d3,d1 (6) bls.s lsqrt3 (10/8) addq.w #1,d2 (4) addq.w #1,d3 (4) sub.l d3,d1 (8) lsqrt3 add.l d0,d0 (8) ; Do 16-th bit-pair. addx.l d1,d1 (8) add.l d0,d0 (8) addx.l d1,d1 (8) add.w d2,d2 (4) move.l d2,d3 (4) add.l d3,d3 (8) cmp.l d3,d1 (6) bls.s lsqrt4 (10/8) addq.w #1,d2 (4) lsqrt4 move.w d2,d0 (4) movem.l (sp)+,d1-d4 (44) rts (16) end ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: Re: long integer square root jnhall@sat.mot.com (Joseph Hall) writes: > In article writeth: estevez@atp3100.tuwien.ac.at (Ernesto Estevez) writes ... >Does somebody has a code or algorithm for extracting a long integer > square root and returning a integer. I suggest looking at Newton's iteration in any decent CS book on the subject. With a sufficiently good first guess you can do it in about 12 instructions on a 68020+. And yes, I have done it, and no, you can't have it. It belong to the company I work for. I pulled this one from my personal inventory. I hereby assign it to the public domain. Enjoy. /* * ISqrt-- * Find square root of n. This is a Newton's method approximation, * converging in 1 iteration per digit or so. Finds floor(sqrt(n) + 0.5). */ int ISqrt(register int n) { register int i; register long k0, k1, nn; for (nn = i = n, k0 = 2; i > 0; i >>= 2, k0 <<= 1) ; nn <<= 2; for (;;) { k1 = (nn / k0 + k0) >> 1; if (((k0 ^ k1) & ~1) == 0) break; k0 = k1; } return (int) ((k1 + 1) >> 1); } -- Here's an article I saved from c.s.m.p four months ago. Strangely, it was only distributed in North America. I don't know how fast it works in practice, but there are no multiplications or divisions in its inner loop, which is promising. From: bruner@sp10.csrd.uiuc.edu (John Bruner) Subject: Re: Fixed Math sqrt, asin, acos Here's an integer square root routine. ----------------------------------------------------------------------- #include /* * It requires more space to describe this implementation of the manual * square root algorithm than it did to code it. The basic idea is that * the square root is computed one bit at a time from the high end. Because * the original number is 32 bits (unsigned), the root cannot exceed 16 bits * in length, so we start with the 0x8000 bit. * * Let "x" be the value whose root we desire, "t" be the square root * that we desire, and "s" be a bitmask. A simple way to compute * the root is to set "s" to 0x8000, and loop doing the following: * * t = 0; * s = 0x8000; * do { * if ((t + s) * (t + s) <= x) * t += s; * s >>= 1; * while (s != 0); * * The primary disadvantage to this approach is the multiplication. To * eliminate this, we begin simplying. First, we observe that * * (t + s) * (t + s) == (t * t) + (2 * t * s) + (s * s) * * Therefore, if we redefine "x" to be the original argument minus the * current value of (t * t), we can determine if we should add "s" to * the root if * * (2 * t * s) + (s * s) <= x * * If we define a new temporary "nr", we can express this as * * t = 0; * s = 0x8000; * do { * nr = (2 * t * s) + (s * s); * if (nr <= x) { * x -= nr; * t += s; * } * s >>= 1; * while (s != 0); * * We can improve the performance of this by noting that "s" is always a * power of two, so multiplication by "s" is just a shift. Also, because * "s" changes in a predictable manner (shifted right after each iteration) * we can precompute (0x8000 * t) and (0x8000 * 0x8000) and then adjust * them by shifting after each step. First, we let "m" hold the value * (s * s) and adjust it after each step by shifting right twice. We * also introduce "r" to hold (2 * t * s) and adjust it after each step * by shifting right once. When we update "t" we must also update "r", * and we do so by noting that (2 * (old_t + s) * s) is the same as * (2 * old_t * s) + (2 * s * s). Noting that (s * s) is "m" and that * (r + 2 * m) == ((r + m) + m) == (nr + m): * * t = 0; * s = 0x8000; * m = 0x40000000; * r = 0; * do { * nr = r + m; * if (nr <= x) { * x -= nr; * t += s; * r = nr + m; * } * s >>= 1; * r >>= 1; * m >>= 2; * } while (s != 0); * * Finally, we note that, if we were using fractional arithmetic, after * 16 iterations "s" would be a binary 0.5, so the value of "r" when * the loop terminates is (2 * t * 0.5) or "t". Because the values in * "t" and "r" are identical after the loop terminates, and because we * do not otherwise use "t" explicitly within the loop, we can omit it. * When we do so, there is no need for "s" except to terminate the loop, * but we observe that "m" will become zero at the same time as "s", * so we can use it instead. * * The result we have at this point is the floor of the square root. If * we want to round to the nearest integer, we need to consider whether * the remainder in "x" is greater than or equal to the difference * between ((r + 0.5) * (r + 0.5)) and (r * r). Noting that the former * quantity is (r * r + r + 0.25), we want to check if the remainder is * greater than or equal to (r + 0.25). Because we are dealing with * integers, we can't have equality, so we round up if "x" is strictly * greater than "r": * * if (x > r) * r++; */ unsigned int isqrt(unsigned long x) { unsigned long r, nr, m; r = 0; m = 0x40000000; do { nr = r + m; if (nr <= x) { x -= nr; r = nr + m; } r >>= 1; m >>= 2; } while (m != 0); if (x > r) r++; return(r); } > -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy ---------------------------------------------------------------------------- From: kourakos@cardinal.ncsc.org (Alexander W. Kourakos) Subject: Re: Area of a traingle formula wanted (not the easy one) Area = 0.5 * (x1 * y2 + x2 * y3 + x3 * y1 - y1 * x2 - y2 * x3 - y2 * x1) (in standard orthagonal coordinates) This comes out of a formula for finding the area of any polygon, given the points. awk ---------------------------------------------------------------------------- From: cpruett@holonet.net Subject: Re: Area of a traingle Originator: cpruett@zen.holonet.net In a message dated 07-05-93 22:34 jeremy@belsize.demon.co.u wrote to comp.sys.mac.programmer: Here!s a easy one (but read the WHOLE question before flaming): Anyone have a general purpose formula to calc the area of a triangle - anyone who says 0.5*base length*perp height gets no prize, but anyone who can give me a formula using only the 3 co-ords of the end points (say, (x1,y1),(x2,y2),(x3,y3)) does get a big smile. Ideally, the value returned can also be -ve for a triangle with the end points in a different order (see what I mean ?) Thanks, Jeremy Doig Here's what I came up with: 2 Sqrt[(x2 y1 - x3 y1 - x1 y2 + x3 y2 + x1 y3 - x2 y3) ] ------------------------------------------------------ 2 I tried it out on a couple of test-triangles and it seems to work but you will want to verify it for yourself. I used Heron's formula: A = Sqrt[s(s-a)(s-b)(s-c)] where s = 0.5(a + b + c) and a,b, & c are the lengths of the three sides. Chris Pruett ---------------------------------------------------------------------------- Subject: Re: Point in Polygon routine needed From: fry@zariski.harvard.edu (David Fry) In article <8fxkxAO00WB7M=nOFO@andrew.cmu.edu> Andrew Lewis Tepper writes: >I don't know if this routine is "standard", I just came up with it recently: > >For a polygon of points p1...pn, and a point P, make a table as follows: > >T(1)= angle from p1 to P to p2 >T(2)= angle from p2 to P to p3 >... >T(n)= angle from pn to P to p1 > >express all angles as: -PI < angle < PI. > >Add all entries in the table. If the sum = 0, the point is outside. If >the sum is +/- PI, the point is inside. If the point is +/- xPI, you >have a strange polygon. If ANY angle was = +/-PI, the point is on the >border. > >Does anyone know if this is considered a good (known?) algorithm? It >took a long time to come up with! You have the essence of the "good," standard algorithm, but if you're working with thousands of polygons, those floating point calculations will be too slow and will probably lead to round-off error at some point. Round-off error is a huge pain in computational geometry problems. The hard part is finding ways to avoid it. The trick is that you don't need to know the angle between the lines, you just need to know its sign, called the *circulation*. For every line AB in the polygon, form the triangle PAB and you want to know does it lie inside the polygon or not. That is, if you were riding a bike counterclockwise around the boundary of the polygon, would P lie on your left or your right when you crossed AB? You can find this by calculating PA x PB = (A.x-P.x)*(B.y-P.y) - (A.y-P.y)*(B.x-P.x). The circulation (the sign of PA x PB) can be found without doing a multiplication in many cases, but this is good enough. Remember, we've oriented the polygon so that you come to A before coming to B as your traverse its boundary. Then the algorithm is: In = 0 foreach edge AB in the polygon if A.y < P.y and B.y >= P.y and PA x PB > 0 In = In + 1 if A.y >= P.y and B.y < P.y and PA x PB < 0 In = In - 1 Then if In = 1, P is inside the polygon, and is outside if I = 0. This doesn't handle the case where P lies on some edge AB of the polygon. David Fry ---------------------------------------------------------------------------- From: fixer@faxcsl.dcrt.nih.gov (Chris Mr. Tangerine Man Tate) Subject: Re: card shuffling algorithm Reply-To: fixer@faxcsl.dcrt.nih.gov Here's a little snippet to shuffle a deck of N cards in Theta(N) time (written in C for the Macintosh): void Shuffle(short deck[N]) { short i, j, temp; for (i = N; i > 1; i--) { j = abs(Random()) % i; // random # from 0 to i-1 temp = deck[j]; deck[j] = deck[i - 1]; deck[i - 1] = temp; // swap cards at positions i and j } } Explanation: working backwards from the "top" of the deck, swap each card with a random card from the as-yet-unvisited portion of the deck. Note that since it's possible to "swap" a card with itself, this algorithm ensures that you can generate the null shuffle, i.e. all cards still in their starting order. If you don't need to "deal out" all the cards at once (e.g. you're shuffling for a poker game rather than for bridge), you can simply do the random swap as each card is needed, and distribute the shuffling time across the dealing. ---------------------------------------------------------------------------- From: cklarson@flauta.engr.ucdavis.edu (Christopher Klaus Larson) Subject: Re: Reversing the bytes in a long... In article absurd@apple.apple.com (Tim Dierks) writes: >In article <1993Feb13.175719.4309@kth.se>, d88-jwa@hemul.nada.kth.se (Jon >Wtte) wrote: > In nicolai@plains.NoDak.edu (Steven Nicolai) writes: > > >I've encountered an interesting brain-teaser working on a program. What's > >the shortest sequence of 68K assembly that will reverse the order of the > >bytes in a long word. (ie convert a little endian number to big endian.) > > Hmm, wouldn't that just be 3 SWAP instructions? > [Stuff deleted] Ain't no SWAP.B; all you can do is swap words. Here's my solution: And here is a register-parameter inline version of Tim's solution (slow night in Davis): /* Code Begins */ #pragma parameter __D0 SwapLong (__D1) unsigned long SwapLong (unsigned long filpMe) = { 0x2001, /* move.l d1,d0 */ 0x4840, /* swap d0 */ 0x1001, /* move.b d1,d0 */ 0x4840, /* swap d0 */ 0x4841, /* swap d1 */ 0x1001, /* move.b d1,d0 */ 0xE198 /* rol.l #8,d0 */ }; /* Code Ends */ --Chris ---------------------------------------------------------------------------- From: cconstantine@galaxy.gov.bc.ca Subject: Re: Reversing the bytes in a long... In article <1993Feb17.151416.868@aio.jsc.nasa.gov>, mark@pokey.jsc.nasa.gov (Mark Manning) writes: Keith Rollin of Taligent was kind enough to donate some C code to me for a database program I am writing (it reads PC files & FoxBas+/Mac files) that would do exactly that. Here it is in C: void ConvertLong(long *num) { long tempLong; char* sourcePtr = (char*) num; char* destPtr = (char*) &tempLong; destPtr[0] = sourcePtr[3]; destPtr[1] = sourcePtr[2]; destPtr[2] = sourcePtr[1]; destPtr[3] = sourcePtr[0]; *num = tempLong; } and here it is in ASM: ConvertLong1: 00000000: 4E56 FFFC LINK A6,#$FFFC 00000004: 48E7 0018 MOVEM.L A3/A4,-(A7) 00000008: 286E 0008 MOVEA.L num(A6),A4 0000000C: 47EE FFFC LEA tempLong(A6),A3 00000010: 16AC 0003 MOVE.B $0003(A4),(A3) 00000014: 176C 0002 0001 MOVE.B $0002(A4),$0001(A3) 0000001A: 176C 0001 0002 MOVE.B $0001(A4),$0002(A3) 00000020: 1754 0003 MOVE.B (A4),$0003(A3) 00000024: 206E 0008 MOVEA.L num(A6),A0 00000028: 20AE FFFC MOVE.L tempLong(A6),(A0) 0000002C: 4CDF 1800 MOVEM.L (A7)+,A3/A4 00000030: 4E5E UNLK A6 00000032: 4E75 RTS 00000034 According to Keith, this is also a faster routine than doing a 1 line bitshift operation such as: void ConvertLong(long *num) { *num = (*num >> 24) | ((*num >> 8) & 0xFF00) | ((*num << 8) & 0xFF0000) | (*num << 24); } that one line produces the following in ASM: 00000000: 4E56 0000 LINK A6,#$0000 00000004: 2F0C MOVE.L A4,-(A7) 00000006: 286E 0008 MOVEA.L $0008(A6),A4 0000000A: 2014 MOVE.L (A4),D0 0000000C: 7218 MOVEQ #$18,D1 0000000E: E2A0 ASR.L D1,D0 00000010: 2214 MOVE.L (A4),D1 00000012: E081 ASR.L #$8,D1 00000014: 0281 0000 FF00 ANDI.L #$0000FF00,D1 0000001A: 8081 OR.L D1,D0 0000001C: 2214 MOVE.L (A4),D1 0000001E: E189 LSL.L #$8,D1 00000020: 0281 00FF 0000 ANDI.L #$00FF0000,D1 00000026: 8081 OR.L D1,D0 00000028: 2214 MOVE.L (A4),D1 0000002A: 7418 MOVEQ #$18,D2 0000002C: E5A9 LSL.L D2,D1 0000002E: 8081 OR.L D1,D0 00000030: 2880 MOVE.L D0,(A4) 00000032: 285F MOVEA.L (A7)+,A4 00000034: 4E5E UNLK A6 00000036: 4E75 RTS 00000038 NOTE: it's almost twice as long (if not longer) and therefore about 1/2 as fast!. Just my (Keith's really) $0.02 worth. Carl B. Constantine CCONSTANTINE@galaxy.gov.bc.ca ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: void comparePStrings(); Here's a little function I wrote that I thought I'd share with y'all. It compares Pascal strings in a manner suitable for sorting. It doesn't actually do alphabetical order. (I figure, in most cases, if you're going to try to do alphabetical order, you should use the Toolbox and properly handle diacriticals.) I wrote this because I have an app that processes lists of strings that have to be sorted so I can access them with a binary search. I decided IUComparePStrings() or whatever it's called would be too slow, and that I was going to have to roll my own. Here it is. Public domain. short comparePStrings(register unsigned char *str1, register unsigned char *str2, Boolean caseSen) { /* * This comparison routine returns 0 if the two strings passed to it * are equal, and a negative or positive number otherwise. The * values can be used to sort strings, i.e. the transitivity law * applies to this function, i.e. if cPS(s1, s2)<0 and * cPS(s2, s3)<0, then it's guaranteed that cPS(s1, s3) is also <0. * * But it doesn't exactly sort in alphabetical order. Specifically, * length is more important than the characters in the string. * For example, cPS("\pa", "\pb") == -1 because 'a' precedes 'b', * but cPS("\paa", "\pb") == 1 because "b" is shorter than "a". */ asm { move.b (str1), D0 // get 1st string's length sub.b (str2)+, D0 // subtract 2nd string's length bne @done2 // if unequal, we're done; return len diff movem.l D1-D3, -(A7) // save registers we'll need moveq #0, D2 // clear out length moveq #32, D3 // store ('a'-'A') in a handy register move.b (str1)+, D2 // put length in its register beq.s @done1 // if length is zero, we're done; return 0 subq #1, D2 // subtract one from length, to set up dbra tst.b caseSen // is this a case-sensitive compare? beq.s @nsLoop // no, go to that nonsensitive loop @sLoop: // yes, do this nonsensitive loop move.b (str1)+, D0 // get char from 1st string sub.b (str2)+, D0 // subtract off corresponding char from 2nd bne @done1 // if unequal, we're done; return char diff dbra D2, @sLoop // loop on length bra.s @done1 // we're done looping; return 0 @nsLoop: move.b (str1)+, D0 // get char from 1st string cmp.b #'Z', D0 // is it greater than 'Z'? bgt.s @nonUC0 // if so, jump to non-upper-case section cmp.b #'A', D0 // is it less than 'A'? blt.s @nonUC0 // if so, jump to non-upper-case section add D3, D0 // it's uppercase; make it lowercase @nonUC0: move.b (str2)+, D1 // get char from 2nd string cmp.b #'Z', D1 // is it greater than 'Z'? bgt.s @nonUC1 // if so, jump to non-upper-case section cmp.b #'A', D1 // is it less than 'A'? blt.s @nonUC1 // if so, jump to non-upper-case section add D3, D1 // it's uppercase; make it lowercase @nonUC1: sub.b D1, D0 // subtract 2nd char from 1st bne @done1 // if unequal, we're done; return char diff dbra D2, @nsLoop // loop on length // length is zero, we're done; return 0 @done1: movem.l (A7)+, D1-D3 // restore registers we used @done2: ext.w D0 // extend byte to a short and return it } } -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: C inline swapping functions After seeing Jon and Matthias' three-instruction code for reversing the order of bytes in a long, I decided to make it a part of my personal library. I also decided to play around with that "#pragma parameter" thing, to make the functions inline. Here's what I ended up with; I'm posting it here because maybe I'll save someone the ten minutes it takes to assemble the code by hand. And also so that, if I did something stupid, someone will point it out. :-) Public domain, of course. /* * swapBytes() swaps the bytes of the word at the given location. * It doesn't usually need to return a value, but it's easy to do so * because of the implementation. The value returned is equal to * the 16-bit word, after it's been swapped. */ #pragma parameter _D0 swapBytes(__A0) short swapBytes(void *theShort) = { 0x3010, // move.w (A0), D0 0xE058, // ror.w D0 0x3080 // move.w D0, (A0) } ; /* * getIBMShort() returns the value of the 16-bit word at the given * location, with the lo and hi bytes reversed. It does the same * thing as swapBytes() except it doesn't store the new value into * the memory location. */ #pragma parameter __D0 getIBMShort(__A0) short getIBMShort(void *thePtr) = { 0x3010, // move.w (A0), D0 0xE058 // ror.w D0 } ; /* * getIBMLong() returns the value of the 32-bit longword at the * given location, with the byte order reversed. */ #pragma parameter __D0 getIBMLong(__A0) long getIBMLong(void *thePtr) = { 0x2010, // move.l (A0), D0 0xE058, // ror.w D0 0x4840, // swap D0 0xE058, // ror.w D0 } ; -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: Re: Preloading regions? Is that a good idea? dmoney@magnus.acs.ohio-state.edu (Dean R Money) writes: >I'm writing a game that's played on a board with 150 small hexagons. I'm assuming these hexes are laid out in a regular grid... >Like Risk, when a player moves/attacks, he clicks on a region. I'm >using polygons when I draw the board in the first place, then when I >have to identify which hexagon is clicked on, I'm having to open 150 >regions, one by one, and use PtInRgn. 150's a little excessive. You can narrow it down to seven if you want to work hard, or nine if you don't. (I don't.) I just finished an app that uses a hex grid. As it happens, I needed to outline the hexes at times, as well at detecting clicks in them. So I decided to use regions. I built one region, the size and shape of one hex. (Wasn't easy, since it had to conform exactly to the bitmapped image our artist drew. I gave up on MoveTo()/LineTo() and went with FrameRect(). If your hexes are large, lines might work best for you; mine were 20x18.) I kept this guy in a global region handle. When I needed to outline a hex, I'd OffsetRgn() it to the appropriate place and FrameRgn() it. Since OffsetRgn() works relative, not absolute, you have to peek at the region's rgnBBox to move it to an absolute location. Since I did this all the time, I made a routine that would return me the global region, offset to the proper point, and would build the region if it happened to be the first time it was being called: RgnHandle CSkinPixels::getHexRgn(short offsetV, short offsetH) { if (theOneTrueHexRgn == NULL) { Rect r; theOneTrueHexRgn = NewRgn(); OpenRgn(); // Region-building code deleted. // (You don't care how my hexes look.) CloseRgn(theOneTrueHexRgn); } OffsetRgn(theOneTrueHexRgn, offsetH - (**theOneTrueHexRgn).rgnBBox.left, offsetV - (**theOneTrueHexRgn).rgnBBox.top); return theOneTrueHexRgn; } I also needed a routine to turn "hex grid" coordinates into QuickDraw h,v coordinates. The former, I call "loc," the latter, "offset." You probably already have a routine like this, though yours may look different (there's two schools of thought for laying out hex grids). void CSkinPixels::getHexOffset(short locV, short locH, short *vOffset, short *hOffset) { *hOffset = -40 + (locH+locV)*15; *vOffset = 92 + (locH-locV)*9; } But what you need is a routine that converts from _offset_ to _loc_. The first thing is to solve those equations in reverse. See those two got to turn them around and express them in terms of the other two variables. Yeah, that algebra stuff--you have to add equations together and everything. (I learned how to do this quickly in linear algebra, but I'm afraid I've forgotten it all. Sorry, Dr. Fink.) My two "inverse" functions ended up being: *locH = ((long) 3*hOffset + 5*vOffset - 340) / 90; *locV = ((long) 3*hOffset - 5*vOffset + 580) / 90; But that just gives you a point that's close to the one you want. After you know that, you have to go through all the adjacent hexes and test them to see if they're the proper ones. Here's what my "offset to loc" function ended up looking like: void CSkinPixels::getHexLocation(short vOffset, short hOffset, short *locV, short *locH) { short i, j; Point thePt; *locH = ((long) 3*hOffset + 5*vOffset - 340) / 90; *locV = ((long) 3*hOffset - 5*vOffset + 580) / 90; thePt.v = vOffset; thePt.h = hOffset; for (j = -1; j <= 1; ++j) { for (i = -1; i <= 1; ++i) { RgnHandle theRgn; short vO, hO; getHexOffset(*locV+j, *locH+i, &vO, &hO); theRgn = getHexRgn(vO, hO); if (PtInRgn(thePt, theRgn)) { goto foundIt; } } } DebugStr("\pCouldn't find it!"); foundIt: *locV += j; *locH += i; } A little more complicated, but it'll take you 150 times less storage, and will run up to 15 times as fast. (My app never hit the debug line, in case you were wondering. :-) Dean, if you want to chat more about this, email me; the details of this probably don't interest c.s.m.p... -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy ---------------------------------------------------------------------------- From: chris@benton.prepress.com (christopher m. knox) Subject: Re: Bit Manipulation help needed. In article Christopher E Rudolph, rudolph@unixg.ubc.ca writes: >First off I'm weak on and'ing and or'ing bits so bear with me if I.... Hi, (we Chris's need to stick together, eh ? :-)) A long integer is 32 bits, a short integer is 16 bits. Using toolbox calls which allow a long as a bit number, you can set and clear bits in any structure by passing an address to the structure: counting from LEFT to RIGHT and starting at 0 you can set/clear/test bit N in your long value V as follows: BitSet((Ptr) &V,(long) N); BitClr((Ptr) &V,(long) N); bitIsSet = BitTst((Ptr) &V,(long) N); If you want to go RIGHT to LEFT (as in your example), use (31 - N) instead of N. If it has to be fast, don't use toolbox calls: (from RIGHT to LEFT): t = 1L << N; /* N = 0 => 1, N = 2 => 2, N = 3 => 4 .... */ V |= t; /* set */ V &= ~t; /* clear */ bitIsSet = (V & t) ? 1 : 0; /* test */ If it has to be still faster, then you should READ the value from LEFT to RIGHT and fill it easily as follows: BTW, V <<= N shifts V N bits to the left and shifts in 0s at the right. > right 0000 0000 0000 0001 > right 0000 0000 0000 0011 > left 0000 0000 0000 0011 > right 0000 0000 0000 1011 > left 0000 0000 0000 1011 #define PushRIGHT(_V) {(_V) <<= 1; (_V) |= 1;} #define PushLEFT(_V) {(_V) <<= 1; /* (_V) |= 0; */} V = 0L; /* ... 0000 0000 0000 0000 */ PushRIGHT(V); /* ... 0000 0000 0000 0001 */ PushRIGHT(V); /* ... 0000 0000 0000 0011 */ PushLEFT(V); /* ... 0000 0000 0000 0110 */ PushRIGHT(V); /* ... 0000 0000 0000 1101 */ PushLEFT(V); /* ... 0000 0000 0001 1010 */ and read it back (albeit in reverse order): #define PopBIT(_V,_bit) {_bit = (_V) & 1L; (_V) >>= 1;} PopBIT(V,isRight); if (isRight) {...} else {...} and so on... I hope this helps you some in 'bitsing' around... chris knox, primary programmer, SpectreTouch "Omne animal triste est post (or without) coitum" Umberto Eco ---------------------------------------------------------------------------- From: salmon@cwis.unomaha.edu (David Salmon) Subject: Re: faster division routine wanted. bayes@hplvec.LVLD.HP.COM writes: > I wrote a routine that seems almost correct, and that allows you to > quickly calculate a set of shifts and add/subs that approximate the > division by any divisor. I know there's some stuff wrong with it, but > thought it's worth sending out now, while the iron is hot. The above mentioned algorithm is essentially correct. However, the implementation: > 230 ! X/1000 = X>>10 + X>>15 - X>>17 + X>>21 > 240 ! The actual division this performs is very close to: > 250 ! X/1000.0724845 (as shown by "approximation" printout.) Does not really reporduce the divide by 1000 correctly. The higher order shifts lose the precision you need for accuracy. You need to shift-add- shift-add as in the following definitions: /* This definition gives accurate results for n < 1 764 866 177 */ #define divNum8(n) ((((((((((((((((-n)>>4)-n)>>3) \ +n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10 /* This definition gives accurate results for n < 1 762 037 711 */ #define divNum7(n) ((((((((((((((-n)>>3) \ +n)>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)>>10 /* This definition gives accurate results for n < 492 965 001 */ #define divNum6(n) (((((((((((n>>2)+n)>>3)+n)>>4)-n)>>2)+n)>>5)+n+1)&t;>10) /* Use of fewer terms yields unacceptable results */ The definition divNum6 is roughly the equivalent of the x/1000 statement given above, but maintains the precision. The numbers listed above represents the limits for which the macros give the same value as the divide instruction. You must use full registers (32 bits) to get the correct results. On a IIfx, the above macro (divNum8) is about 25% faster than the divide instruction. divNum6 which should be sufficient for your purposes should be faster. Hope this helps. ds -- David C. Salmon salmon@unomaha.edu ---------------------------------------------------------------------------- From: bruner@sp10.csrd.uiuc.edu (John Bruner) Subject: Re: faster division routine wanted. An alternate way to do division is to multiply by the fixed-point reciprocal. In this case, you can do it with quotient = (0x8312 * numerator + 0x8000) >> 25; ---------------------------------------------------------------------------- From: bayes@hplvec.LVLD.HP.COM (Scott Bayes) Subject: Re: (Q) Need bit manipulation algorithm > name, but whose initials are M.A.C. And if you want a solution for > a word, and you don't want a 32K lookup table, you actually have to > _shift_! Shifting is for weenies. But for a word, it's just the sum of the bits in each byte. Code something like this (untested) should do it PDQ. --- * at beginning, d0 contains a word whose "1" bits we wish to count * at end, d2 contains number of 1 bits in its lower byte moveq #0,d1 - clear table index register move.b d0,d1 - lower byte of data word move.b btable(d1.w),d2 - nbits for lower byte (PC relative) lsr.w #8,d0 - access upper byte of data word move.b d0,d1 - upper byte of data word add.b btable(d1.w),d2 - d2 contains total bits of word --- ScottB ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: Re: faster division routine wanted. sage.cc.purdue.edu (Broc Seib) writes: >ctvien@cs.concordia.ca (VIEN cam thanh) wrote: >> >> I am looking for a division routine that will divide a 2 bytes number >> by 1000 faster than M68000's DIV instruction. First of all: are you really on the 68000? If not, you should be aware that the '020 and later chips are much faster at division. My 68000 manual pegs a signed division at 158 cycles, +/- less than 10%. You can get a few dozen shifts and moves in that time, so custom code might help. But on an '020 or later, the chip will almost certainly beat you. >Replace your DIV operand with an >LSR (or is it ASR --- which ever one fills in the upper byte with zeros as >it shifts.) LSR. Logical shifts fill with zero, arithmetic shifts copy the sign bit. >With this operand you can _quickly!_ divide by 1024 (shift to >the right 10 bits). i.e. > > move.w #$operand,d0 > lsr.w 10,d0 Actually, you have to put the 10 into a register; you can only shift up to eight bits in immediate mode. You can shift a longword. >If dividing by 1000 _exactly_ is of essence, then you may accomplish this with >a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as >long (or possibly) longer than the DIV operand itself. You would have to >count operand cycles (and compare to DIV) to determine this. > > [sample multiply-by-ten code deleted] If I'm not mistaken, hardcoded division is a lot harder than hardcoded multiplication. Dividing by 1000 is dividing by eight and then dividing by five three times. How do you divide by five? To divide by five, you divide by four and take four-fifths of the remainder. How do you take four-fifths of something? You divide by five and subtract the result from the original. You recurse like this until the numbers get small, which is not all that efficient. And it's no fun to write either. This is why microcode engineers get paid lots of money. If you really need speed, and you're on a 68000, build a 65,536-byte lookup table, so that x/1000==table[x]. Fourteen cycles. Can't beat it! -- Jamie McCarthy Internet: k044477@kzoo.edu AppleLink: j.mccarthy ---------------------------------------------------------------------------- From: sage.cc.purdue.edu (Broc Seib) Subject: Re: faster division routine wanted. In article <5662@daily-planet.concordia.ca>, ctvien@cs.concordia.ca (VIEN cam thanh) wrote: I am looking for a division routine that will divide a 2 bytes number by 1000 faster than M68000's DIV instruction. I am not sure of the application in which you are doing this, but IF you only need an approximate division by 1000, AND this is an operation on a 2-byte integer (not floating point numbers!), then you may take advantage of the processor's speed in "shifting". Replace your DIV operand with an LSR (or is it ASR --- which ever one fills in the upper byte with zeros as it shifts.) With this operand you can _quickly!_ divide by 1024 (shift to the right 10 bits). i.e. move.w #$operand,d0 lsr.w 10,d0 - Broc Seib, Purdue University ---------------------------------------------------------------------------- If that solution is inadequate, then this may possibly help... ---------------------------------------------------------------------------- If dividing by 1000 _exactly_ is of essence, then you may accomplish this with a predefined sequence of LSR's and SUBQ's. This, however, may run nearly as long (or possibly) longer than the DIV operand itself. You would have to count operand cycles (and compare to DIV) to determine this. I will show an analogous application that multiplies a number by 10 (before the days of luxurious MUL operand); I will leave the divide by 1000 story as a homework problem for the reader. The multiply by 10 story is as follows: move.w #$operand,d1 move.w d1,d0 lsl.w 2,d0 add.w d0,d1 lsl.w 1,d0 Verify this multiplication by 10 for yourself w/ pencil & paper. Hint: The divide by 1000 story is simply 3 divide by 10's; just be careful with what you subtract from the "accumulator" register! (I am a little rusty on my assembly; please excuse my mistakes) Good Luck. - Broc Seib ---------------------------------------------------------------------------- From: k044477@hobbes.kzoo.edu (Jamie R. McCarthy) Subject: Re: (Q) Need bit manipulation algorithm