Help with ECC Point Doubling and Addition - Cemetech | Forum

» Forum > Calculator Programming > z80 & ez80 Assembly

»
» View previous topic :: View next topic

ACagliano
Guru-in-Training (Posts: 2001)

Help with ECC Point Doubling and Addition
29 Nov 2022 08:55:38 am

So I've run up against the point in work on elliptic curve diffie hellman in which I need to implement point addition and doubling in a manner that will not take 6 hours to complete. Here's the research I've done so far and what, from what I understand, is needed:

Point Addition

Code:




Point addition seems to be an operation relative to the slope of the line that links both points, defined:


P + Q = R


if P or Q are point at infinity (0,0?), R = the other point


if P == Q, double instead


if Xp == Xq, set R to point at infinity.


else:


slope = (Yp - Yq) / (Xp - Xq)


Xr = slope^2 + Xp - Xq


Yr = slope(Xp - Xr) - Yp

Point Doubling

Code:




P + P = R


compute slope as:


slope = (3(Xp)^2 + a) / 2(Yp)


use slope in the Xr, Yr computations for addition

I foresee a few issues trying to implement this:
1. Value overflow. From what I can tell, this doesn't involve modulo some type width, you need to work with exact values. So would this need a few bytes to buffer for overflow?
2. Computational cost for division. The curve I am using is a 224-bit curve, meaning this will be running that many divisions * 2. Isn't that expensive? I read something in the docs about deferring division, ie: maintaining a numerator and a denominator until the end and then computing division then? How does that work exactly? Is there a better way?
3. Fastest algorithm for multiplication? As of now I'm using the double-then-add method for multiplication, but is there a faster way? If so, how does it work.

My existing API uses:

Code:




#define ECC_PRV_KEY_SIZE   28


...


#define OVERFLOW_BYTES      4





// a bigint type constrainted to the private key size + 4 bytes for overflow buffering


typedef uint8_t BIGINT[ECC_PRV_KEY_SIZE + OVERFLOW_BYTES];





struct Point {


   BIGINT x;


   BIGINT y;


};





struct Curve {


   BIGINT polynomial;


   BIGINT coeff_a;


   BIIGINT coeff_b;


   Point G;


   BIGINT b_order;


   uint8_t cofactor;


};

I save all static data as big-endian encoded bytearrays, but when loading the data in for arithmetic, the array is reversed into little endian for more efficient mathing.

What I am seeking here is (1) recommendations for the best way to do the 4 issues I indicated above, as well as to fish for if anyone has, or can write up, fast bigint arithmetic routines supporting up to 32-bytes. Or at least can point to the best resources for implementing it, though I'd be doing it in C which would not have the ideal speed. It would seem we need addition, subtraction, multiplication, division, and squaring.
Timing resistance preferred.

For reference, the full API as it presently stands is:
https://github.com/acagliano/cryptx/blob/dev/src/ecdh.c

And I've taken some stabs at writing the bigint routines I think I can do myself:
Here they are (ps the shifting was written by Zeda. I've done add/sub/mul)
https://github.com/acagliano/cryptx/blob/dev/src/asm/ecdh_ops.asm

ACagliano
Guru-in-Training (Posts: 2001)

01 Dec 2022 06:11:44 pm

Update
With a lot of help from the likes of calc84maniac and jacobly, I have managed to work out bigint addition and multiplication. I'll post the code here if anyone else wants to recommend fixes or optimizations.
I am still missing multiplicative inversion. To that end, I will post the algorithm based on my reference for anyone who may have a fast way to do it.

Addition/Subtraction

Code:




; bigint_add(uint8_t *op1, uint8_t *op2);


; hard limit to 32 bytes


; output in op1


; addition over a galois field of form GF(2^m) is mod 2 or just xor


_bigint_add:


   ti._frameset0


   ld hl, (ix + 3)      ; op2


   ld de, (ix + 6)      ; op1


   ld b, 32


.loop:


   ld a,(de)


   ;adc a,(hl)


   xor (hl)


   ld (de),a


   inc hl


   inc de


   djnz .loop


   ld sp, ix


   pop ix


   ret





; apparently they are the same


_bigint_sub := _bigint_add

Multiplication

Code:




; bigint_mul(uint8_t *op1, uint8_t *op2)


; multiplication is add then double, then a polynomial reduction


_bigint_mul:


   ld hl, 32


   ti._frameset


   lea de, ix - 32      ; stack mem?


   ld hl, (ix + 9)      ; op1 (save a copy)


   ld bc, 32


   ldir            ; ix - 32 = tmp = op1


   


   ; zero out op1


   ld de, (ix + 9)      ; op 1


   ld a, 0


   ld (de), a


   inc de


   ld hl, (ix + 9)


   ld bc, 31


   ldir            ; op1 = res = 0


   


   ld hl, (ix + 6)      ; op2 = for bit in bits


   ld c, 32


.loop_op2


   ld a, (hl)


   ld b, 8


.loop_bits_in_byte:


   rra


   push af


      sbc a,a


      push bc, hl


         ld c,a


         


         ; add op1 (res) + tmp


         ld hl, (ix +9)      ; hl = op1 (dest)


         lea de, ix - 32      ; de = tmp (src)


         ld b, 32


.loop_add:


         ld a, (de)


         and a, c


         xor a, (hl)


         ld (hl), a


         inc hl


         inc de


         djnz .loop_add


      


         ; now double tmp


         lea hl, ix - 32      ; tmp in hl


         ld b, 32


         or a            ; reset carry


.loop_mul2:


         inc hl


         rl (hl)


         djnz .loop_mul2


         


         ; now xor with polynomial if tmp degree too high


         ; this means timing analysis will leak polynomial info


         ; however, this is a public spec and therefore not


         ; implementation-breaking


         bit 1, (ix - 4)      ; polynomial is 233 bits, check 234th bit


         jr z, .no_xor_poly





         ; xor byte 1 (little-endian encoding)


         ld a, (ix - 32 + 1)


         xor 2


         ld (ix - 32 + 1), a


         


         ; xor byte 21 (little endian encoding)


         ld a, (ix - 32 + 21)


         xor 4


         ld (ix - 32 + 21), a


         


         ; xor byte 28 (little endian encoding)


         ld a, (ix - 32 + 28)


         xor 1


         ld (ix - 32 + 28), a


         


.no_xor_poly


      pop hl,bc


   pop af


   djnz .loop_bits_in_byte


   inc hl


   dec c


   jr nz, .loop_op2


   ld sp, ix


   pop ix


   ret

And lastly:
Algorithm for Mult Inv

Code:




_bigint_invert:


   ; tmp = op


   ; v = polynomial


   ; g = 0


   ; res = 1


   ; while(tmp != 1)


   ;    i = degree(tmp) - degree(v)


   ;   if( i < 0 )


   ;      swap tmp1, v


   ;      swap g, res


   ;      i = -i


   ;   h = left_shift v by i bits


   ;   add tmp, h


   ;   h = left_shift g by i bits


   ;   add res, h


        ; return res

ACagliano
Guru-in-Training (Posts: 2001)

04 Dec 2022 09:53:32 pm

So I have completed the bigint routine for multiplicative inverse over a finite field of form F2m.

The code is here: https://github.com/acagliano/cryptx/blob/stable/encrypt/encrypt.asm#L6231
There are some issues with the algorithm. For reasons currently unknown to me the algorithm seems to work fine for powers of 2, but loop infinitely for anything that is not an even power of two. Several days of debugging have brought it to its current state from infinitely looping for everything. If anyone is good with this kind of math and wants to take a stab at determining how dumb I am, feel free. Please and thanks.

Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page

Reply to topic

»
» View previous topic :: View next topic

Page 1 of 1

» All times are UTC - 5 Hours