(sorry for the horrible formatting, I couldn't figure out how to make a monospaced font)

In z80 assembly, the division algorithms you might have seem probably seem unintuitive. But guess what? They are usually a variation on long division that you learned in school, but modified for binary. In my experience, math in binary is so much easier and faster than in base 10, so here is an example. Let's try to divide 85 by 3:

(Note: 3=11b and 85=1010101b. The math in this example is all done in binary)

Code:

        ________
     11|1010101

Now you check if 11≤1. It isn't, so you have a zero:
        0_______
     11|1010101

Now you check if 11≤10. It isn't, so you have another zero:
        00______
     11|1010101

Now you check if 11≤101. It is, so you subtract 101-11 to get 10.
        001_____
     11|1010101
         10

Now you check if 11≤100. It is, so you subtract 100-11 to get 1.
        0011____
     11|1010101
         10
           1
Now you check if 11≤11. It is, so you subtract 11-11 to get 0.
        00111___
     11|1010101
         10
           1
            0
Now you check if 11≤00. It isn't, so you put a zero up top.
        001110__
     11|1010101
         10
           1
            0
Now shift in the last bit, a 1.Since 11≤01 is not true, we have a zero.
        0011100_
     11|1010101
         10
           1
            0

We have a remainder of 1, and the quotient is 11100 which is 28. 28*3=84, add 1 and you have 85. Now how do we implement this in pseudocode?

P is the numerator
Q is the denominator
(we are essentially doing P/Q, then)
Let B be the number of bits
Let C be the remainder between each step
Let A be the accumulator

Code:

   0→C
   0→A
Loop:
   A<<    (I mean to say, shift A left, also the same as multiply by 2)
   Q<<C   (I mean to say, shift Q left, shifting the carry into C
   If C≤P
     C-P→C
     A++  (increment A. Since A was multiplied by 2, the last bit would be zero. This sets it to 1)
   B--
   If B
   Goto Loop


But what would this look like in assembly? Here is an example:

Code:

;we will do D/E
     xor a          ;0→C      we let A be the remainder
     ld bc,0800h    ;8→B, 0→A We let C be the accumulator
Loop:
     sla c          ;A<<
     rl e \ rla     ;Q<<C
     cp d           ;If C≤P
     jr nc,$+4
       sub d        ;C-P→C
       inc c        ;A++
     djnz Loop      ;B-- \ If B \ Goto Loop
     ret


This works, but can we optimise it for the z80? Of course! Note that the registers E and C are 8 bits. Since we are shifting a bit out of E each time, we can use that doubly as an accumulator. here is what I mean:

Code:

;we will do D/E
     xor a          ;0→C      we let A be the remainder
     ld b,8         ;8→B      we will let Q be the accumulator
Loop:
     rl e \ rla     ;Q<<C, also A<<
     cp d           ;If C≤P
     jr nc,$+4
       sub d        ;C-P→C
       inc e        ;A++
     djnz Loop      ;B-- \ If B \ Goto Loop
     ret

This routine is pretty good for D/E/ How good? 14 bytes and 344+3b cycles (where b is at most 8) This can basically execute over 17000 times a second at 6MHz, over 43000 at 15MHz.

I hope this helps people!
This is a great explanation, Xeda; you get a serious gold star. I've been using Z80 Bits' routines for ages, which as you know use this method. I did eventually figure out how it worked and that it was essentially bit-wise long division, clued in by the fact that the loop counters tended to more or less match the numerator/denominator bit width, but this would have been very helpful. Thanks for sharing this with us!
Thanks! There have been several people asking how the algorithm works, so I thought I might try to explain it Smile In my spare time, I am developing a document that goes in depth on every aspect of z80 assembly coding that I can think about. Only pieces of it will be released, I am sure, but division has yet to make it in there. I couldn't think of a clear way to explain it before, but I will probably use this.
in the explenation, you always check if 11 <= a number, but from where do you get that number?
That is 11b, which is 3. That is the number that I was dividing by.
I actually meant the number behind the <=. First you use 1, the first digit, then 10, the first 2 digits, then 101, the first 3 digits, then suddenly 100. But where does the 100 come from?
Because I subtract the 11 from 101, giving me 10. Rotate in the next bit, a 0, and you get 100 Smile
The topic of fixed point division was brought up in SAX.

In decimal, if we divide 99.33/.33, we can multiply numerator and denominator by 100 and get two integers to use. Basically:
99.33/.33=9933/33
Likewise, if we are using 8.8 fixed point numbers, if we want to divide them we can simply treat them as two sixteen bit numbers. In a normal 16-bit division algorithm, you will have "ld b,16" and loop the algorithm 16 times. However, since you want extra precision after the decimal, (in this case, 8 extra bits), we would just loop it 8 more times. Essentially, change "ld b,16" to "ld b,24" . Normally, this will remove the top 8 bits of the integer part, so you will need to be careful of that.

Please correct me if I am wrong as I've never actually implemented this myself Razz I am worried about doing something like dividing by .5 (which is multiplying by 2) or anything else like that, so feel free to report back!
Just posting to point out that nonrestoring division is an extension of this which in general is going to be about twice as fast as straight long division ("restoring division"). Basically, instead of either adding or not adding a factor in each binary place, you choose to either add or subtract it, and rely on the overflow properties of two's complement arithmetic to converge twice as quickly. http://en.wikipedia.org/wiki/Division_algorithm#Non-restoring_division


However both of those algorithms are classified as "slow division" algorithms, and other algorithms will have even better performance Smile
Ok, Does anyone know if this is going too work?


Code:

#define cpAHLCDE ld b,a \ ld a,l \ sub e \ ld a,h \ sbc a,d \ ld a,b \ sbc a,c \ ld a,b

#define loadAHL(pointer) ld a, (pointer) \ ld hl, (pointer+1)
#define saveAHL(pointer) ld (pointer), a \ ld (pointer+1), hl

#define loadCDE(pointer) ld b, a \ ld a, (pointer) \ ld c, a \ ld a, b \ ld hl, (pointer+1)
#define saveCDE(pointer) ld b, a \ ld a, c \ ld (pointer), a \ ld a, b \ ld (pointer+1), hl

#define ldAHLCDE ld a, c \ ld h, d \ ld l, e
#define ldCDEAHL ld c, a \ ld d, h \ ld e, l

#define FPadd add hl, de \ adc a, c
#define FPadc adc hl, de \ adc a, c
#define FPsub sub c \ sbc hl, de

;fixed-point: run loop 32 times instaed of 24
;RoutineRam = 12-byte area of RAM

;RoutineRam = backup of numerator
;RoutineRam+3 = backup of denominator
;RoutineRam+6 = remainder
;RoutineRam+9 = accumulator

FPDiv:
  ;divides ahl by cde

  ;backup numerator and denominator
  saveAHL(RoutineRam)
  saveCDE(RoutineRam+3)

  ;zero the remainder and accumulator
  ld hl, RoutineRam+6
  ld (hl), 0
  ld de, RoutineRam+7
  ld bc, 5
  ldir

  ;repeat loop 32 times
  ld b, 32
 
FPDivLoop:
  push bc
  loadAHL(RoutineRam+9)
  ldCDEAHL
  FPadd
  saveAHL(RoutineRam+9)
  loadAHL(RoutineRam+3)
  ldCDEAHL
  FPadc
  saveAHL(RoutineRam+3)
  loadCDE(RoutineRam)
  loadAHL(RoutineRam+6)
  cpAHLCDE
  jr nc, CLargerThanP

    FPsub
    saveAHL(RoutineRam+6)
    loadAHL(RoutineRam+9)
    ld de, 0
    inc a
    adc hl, de
    saveAHL(RoutineRam+9)

CLargerThanP:
  pop bc
  djnz FPDivLoop


EDIT: currentely, at 14043 cycles, it's very slow, allowing only 570 executions per secound at 8MHz and 1068 at 15MHz. Can this be optimized?
If you don't mind using shadow registers, this might work nicely:

Code:
FPDiv:
   ; Divides AHL by CDE, returns result in AHL
   push hl
   ld l,a
   xor a
   ld h,a
   exx
   pop hl
   ld c,h
   ld h,l
   ld l,a
   ld b,24
FPDivLoop:
   add hl,hl
   rl c
   exx
   adc hl,hl
   rla
   sbc hl,de
   sbc a,c
   jr nc,FPDivNoRestore
   add hl,de
   adc a,c
   exx
   djnz FPDivLoop
   ld a,c
   ret
FPDivNoRestore:
   exx
   inc l
   djnz FPDivLoop
   ld a,c
   ret


By the way, your FPsub macro is bugged, it should be or a \ sbc hl,de \ sbc a,c
Thank you for both the optimization and the fix.
If you want, I can modify the division routine to work with signed values.
I already did. This is the full division routine so far:

Code:

FPDiv:
  ;divides ahl by cde

  ;handle sign
  push af      ;1
  xor c
  res 0, (IY+RoutineFlags)
  bit 7, a
  jr nz, ResultPos2
  set 0, (IY+RoutineFlags)
ResultPos2:
  pop af      ;0
  bit 7, a
  call nz, negAHL
  bit 7, c
  call nz, negCDE

FPDiv:
   ; Divides AHL by CDE, returns result in AHL
   push hl
   ld l,a
   xor a
   ld h,a
   exx
   pop hl
   ld c,h
   ld h,l
   ld l,a
   ld b,24
FPDivLoop:
   add hl,hl
   rl c
   exx
   adc hl,hl
   rla
   sbc hl,de
   sbc a,c
   jr nc,FPDivNoRestore
   add hl,de
   adc a,c
   exx
   djnz FPDivLoop
   ld a,c
   ret
FPDivNoRestore:
   exx
   inc l
   djnz FPDivLoop
   ld a,c

  ;sign
  bit 0, (IY+RoutineFlags)
  call nz, negAHL
  ret
Here's my signed version (which has inline optimized negations):


Code:
FPDiv:
   ; Divides AHL by CDE, returns result in AHL
   xor c
   push af
   xor c
   jp p,FPDividendPos
   ld b,a
   xor a
   sub l
   ld l,a
   ld a,0
   sbc a,h
   ld h,a
   ld a,0
   sbc a,b
FPDividendPos:
   push hl
   ld l,a
   xor a
   ld h,a
   bit 7,c
   jr nz,FPDivisorNeg
   sub e
   ld e,a
   ld a,h
   sbc a,d
   ld d,a
   ld a,h
   sbc a,c
   ld c,a
   ld a,h
FPDivisorNeg:
   exx
   pop hl
   ld c,h
   ld h,l
   ld l,a
   ld b,24
FPDivLoop:
   add hl,hl
   rl c
   exx
   adc hl,hl
   rla
   add hl,de
   adc a,c
   jr c,FPDivNoRestore
   sbc hl,de
   sbc a,c
   exx
   djnz FPDivLoop
   jr FPDivEnd
FPDivNoRestore:
   exx
   inc l
   djnz FPDivLoop
FPDivEnd:
   pop af
   ld a,c
   ret p
   xor a
   sub l
   ld l,a
   ld a,b
   sbc a,h
   ld h,a
   ld a,b
   sbc a,c
   ret


Edit: Changed it to force the divisor to be negative so I could use an addition as a comparison
Edit2: Typo

Edit3: Here's a version with saturation (clips the values to the range of -0x8000.00 to 0x7FFF.FF)
Edit4: It turns out I wasn't saturating to the signed range, so this new version combines the first division iteration with the saturation check.
Edit5: Removed extraneous EXX
Edit6: Implemented a small RunerOpt
Edit7: Small size optimization for the saturation case

Code:
FPDiv:
   ; Divides AHL by CDE, returns result in AHL
   xor c
   push af
   xor c
   jp p,FPDividendPos
   ld b,a
   xor a
   sub l
   ld l,a
   ld a,0
   sbc a,h
   ld h,a
   sbc a,a
   sub b
FPDividendPos:
   push hl
   ld l,a
   xor a
   ld h,a
   bit 7,c
   jr nz,FPDivisorNeg
   sub e
   ld e,a
   ld a,h
   sbc a,d
   ld d,a
   ld a,h
   sbc a,c
   ld c,a
   ld a,h
FPDivisorNeg:
   exx
   pop hl
   ld c,h
   ld h,l
   ld l,a
   
   ; First iteration and saturation check
   add hl,hl
   rl c
   exx
   add hl,de
   adc a,c
   jr c,FPDivSaturate
   sbc hl,de
   sbc a,c
   exx
   
   ; Next 23 iterations
   ld b,23
FPDivLoop:
   add hl,hl
   rl c
   exx
   adc hl,hl
   rla
   add hl,de
   adc a,c
   jr c,FPDivNoRestore
   sbc hl,de
   sbc a,c
   exx
   djnz FPDivLoop
   jr FPDivEnd
FPDivNoRestore:
   exx
   inc l
   djnz FPDivLoop
FPDivEnd:
   pop af
   ld a,c
   ret p
   xor a
   sub l
   ld l,a
   ld a,b
   sbc a,h
   ld h,a
   ld a,b
   sbc a,c
   ret
   
FPDivSaturate:
   sbc hl,hl ;HL = $FFFF
   pop af
   ld a,$7F
   ret p
   inc a
   inc hl
   ret
Heavy bump,

Sorry to revive such an old topic, but as I needed a 24-bits division routine (AHL by CDE fits nicely), I tried both of calc84's routines and both give me only wrong results. Even dividing by 1 won't work. Doing this gives me 32768 :

Code:
 xor a
 ld hl, 32000
 ld c, a
 ld de, 10
 call FPDiv
 bcall _dispHL
 ret

And putting 1 in DE gives 0. I triple-checked each routine, I didn't make any flaw while recopying.
matrefeytontias wrote:
Heavy bump,

Sorry to revive such an old topic, but as I needed a 24-bits division routine (AHL by CDE fits nicely), I tried both of calc84's routines and both give me only wrong results. Even dividing by 1 won't work. Doing this gives me 32768 :

Code:
 xor a
 ld hl, 32000
 ld c, a
 ld de, 10
 call FPDiv
 bcall _dispHL
 ret

And putting 1 in DE gives 0. I triple-checked each routine, I didn't make any flaw while recopying.


I barely even remember working on this now, but... I think those are the expected results. Keep in mind that these routines work on 16.8 fixed point values. So, you're actually dividing by 10/256 and 1/256, for which the low 16 bits will indeed be 32768 and 0 respectively (the high bits in A giving the rest of the results). Do you actually want a routine that just works on normal integers?
Have you checked out Xeda's routines?

EDIT: Xeda's routines also work on floats:
Quote:
24-bit floats are organised as:
1 bit sign
7 bits signed exponent
1 bit integer part of the mantissa
15 bits fractional part of the mantissa
For example, pi could be represented by:
ld a,1 ;sign is positive, exp=1 (the mantissa is multiplied by 2^1)
ld hl,$C910 ;pi, rounded to 16 bits. C910 = 51472, 51472/32768*2^1=3.1416015625
Woops. I didn't see those worked with 16.8, sorry. Anyway, I found a "normal" AHL/CDE routine from Runer112 here on Omnimaga, so it's okay.

chickendude yeah, the thing is that they only work on floats, and I want integer division Razz

EDIT : by the way, if you want to waste time optimizing my ugly code, here's the full routine, including the wrapper I put around Runer's code to make it signed :


Code:
sDiv24:
  xor c
  push af
  xor c
  jp p, _sDiv24_divdPos
  ld b, a
  xor a
  sub l
  ld l, a
  ld a, 0
  sbc a, h
  ld h, a
  ld a, 0
  sbc a, b
_sDiv24_divdPos:
  bit 7, c
  ld b, a
  jr z, _sDiv24_divrPos
  xor a
  sub e
  ld e, a
  ld a, 0
  sbc a, d
  ld d, a
  ld a, 0
  sbc a ,c
  ld c, a
  ld a, b
_sDiv24_divrPos:

  push hl
  pop ix
  ld l, 24
  push hl
  xor a
  ld h, a
  ld l, a
_sDiv24_loop:
  add ix, ix
  rl b
  adc hl, hl
  rla
  cp c
  jr c, _sDiv24_skip
  jr nz, _sDiv24_setBit
  sbc hl, de
  add hl, de
  jr c, _sDiv24_skip
_sDiv24_setBit:
  sbc hl, de
  sbc a, c
  inc ix
_sDiv24_skip:
  ex  (sp), hl
  dec l
  ex  (sp), hl
  jr  nz, _sDiv24_loop
  pop de
  ld c, a
  push ix
  pop de
  ex de, hl

  pop af
  ld a, b
  ret p
  xor a
  sub l
  ld l, a
  ld a, 0
  sbc a, h
  ld h, a
  ld a, 0
  sbc a, b
  ret

Tested, works.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement