So I've run up against the point in work on elliptic curve diffie hellman in which I need to implement point addition and doubling in a manner that will not take 6 hours to complete. Here's the research I've done so far and what, from what I understand, is needed:

Point Addition

Point Doubling

I foresee a few issues trying to implement this:

1. Value overflow. From what I can tell, this doesn't involve modulo some type width, you need to work with exact values. So would this need a few bytes to buffer for overflow?

2. Computational cost for division. The curve I am using is a 224-bit curve, meaning this will be running that many divisions * 2. Isn't that expensive? I read something in the docs about deferring division, ie: maintaining a numerator and a denominator until the end and then computing division then? How does that work exactly? Is there a better way?

3. Fastest algorithm for multiplication? As of now I'm using the double-then-add method for multiplication, but is there a faster way? If so, how does it work.

My existing API uses:

I save all static data as big-endian encoded bytearrays, but when loading the data in for arithmetic, the array is reversed into little endian for more efficient mathing.

What I am seeking here is (1) recommendations for the best way to do the 4 issues I indicated above, as well as to fish for if anyone has, or can write up, fast bigint arithmetic routines supporting up to 32-bytes. Or at least can point to the best resources for implementing it, though I'd be doing it in C which would not have the ideal speed. It would seem we need addition, subtraction, multiplication, division, and squaring.

Timing resistance preferred.

For reference, the full API as it presently stands is:

https://github.com/acagliano/cryptx/blob/dev/src/ecdh.c

And I've taken some stabs at writing the bigint routines I think I can do myself:

Here they are (ps the shifting was written by Zeda. I've done add/sub/mul)

https://github.com/acagliano/cryptx/blob/dev/src/asm/ecdh_ops.asm

Point Addition

**Code:**```
```

Point addition seems to be an operation relative to the slope of the line that links both points, defined:

P + Q = R

if P or Q are point at infinity (0,0?), R = the other point

if P == Q, double instead

if Xp == Xq, set R to point at infinity.

else:

slope = (Yp - Yq) / (Xp - Xq)

Xr = slope^2 + Xp - Xq

Yr = slope(Xp - Xr) - Yp

Point Doubling

**Code:**```
```

P + P = R

compute slope as:

slope = (3(Xp)^2 + a) / 2(Yp)

use slope in the Xr, Yr computations for addition

I foresee a few issues trying to implement this:

1. Value overflow. From what I can tell, this doesn't involve modulo some type width, you need to work with exact values. So would this need a few bytes to buffer for overflow?

2. Computational cost for division. The curve I am using is a 224-bit curve, meaning this will be running that many divisions * 2. Isn't that expensive? I read something in the docs about deferring division, ie: maintaining a numerator and a denominator until the end and then computing division then? How does that work exactly? Is there a better way?

3. Fastest algorithm for multiplication? As of now I'm using the double-then-add method for multiplication, but is there a faster way? If so, how does it work.

My existing API uses:

**Code:**```
```

#define ECC_PRV_KEY_SIZE 28

...

#define OVERFLOW_BYTES 4

// a bigint type constrainted to the private key size + 4 bytes for overflow buffering

typedef uint8_t BIGINT[ECC_PRV_KEY_SIZE + OVERFLOW_BYTES];

struct Point {

BIGINT x;

BIGINT y;

};

struct Curve {

BIGINT polynomial;

BIGINT coeff_a;

BIIGINT coeff_b;

Point G;

BIGINT b_order;

uint8_t cofactor;

};

I save all static data as big-endian encoded bytearrays, but when loading the data in for arithmetic, the array is reversed into little endian for more efficient mathing.

What I am seeking here is (1) recommendations for the best way to do the 4 issues I indicated above, as well as to fish for if anyone has, or can write up, fast bigint arithmetic routines supporting up to 32-bytes. Or at least can point to the best resources for implementing it, though I'd be doing it in C which would not have the ideal speed. It would seem we need addition, subtraction, multiplication, division, and squaring.

Timing resistance preferred.

For reference, the full API as it presently stands is:

https://github.com/acagliano/cryptx/blob/dev/src/ecdh.c

And I've taken some stabs at writing the bigint routines I think I can do myself:

Here they are (ps the shifting was written by Zeda. I've done add/sub/mul)

https://github.com/acagliano/cryptx/blob/dev/src/asm/ecdh_ops.asm