I was working with
Markov chains and a dictionary of 264060 English words a short while ago, and one of the by-products was a list of the 676 possible A-to-Z letter pairings, ordered by their frequency. (Unfortunately, I only allowed up to one "hit" per word, so the data might be slightly skewed for the purposes of a compression algorithm.)
Note that the calculator holds both one- and two-byte tokens. Inherently, the latter would not be suitable for compressing anything less than three bytes. 240 of the single-byte tokens can be written without ASM (and setting aside the double quote, as it can't normally be read from a string). Therefore, a bijection of 240 pairings is ideal. Here are those that ranked the highest:
ES IN ER TI AT TE ON IS RE NG EN ST ED AL RI LI RA LE AN NE SE AR IC OR RO NT IT CO IO IE SS LA DE NS CA RS NI TR TA DI AS HE TO UN ME CH LL LO OL EL ET OU SI MI MA PE IL VE LY NA AC US OM IA EA TH HI UR HO ND TS OS EC NO CE PH UL PR HA NC GE OP SH AB PO PA OT CI MO OG EM BL AM PI ID AP IZ UT CT SC SA BI AD GI SU OC SP BE IM BA OO SO CR CK AG UM EE RT GR KE EP IV SM IR FI BO ZE AI MP GA IG VI TU PL DO CU TT OD LU DA IP RU GS RM OV HY MS LS DS TY QU RY KI BR IF RR RD WA FO AU OW FE GL CL RC EX OI VA BU UP DR MB UB UC GO EO OB FL PS EG IB RN UI MU UA EI PP FA FU EF MM WE PT NN LT WI GU PU WO KS SL FF OA EV DU UE YS HR AV GH RP SN RG TL RB UD EU YP ZI NU ZA FR EB AK SY YL AY EW AE DL NF HU GN GG OE RL NK HT UG LD
This all assumes a rather simplified set of items to replace. If "RING" happens to be more prevalent than "LD" (and
it is), then it still hasn't been accounted for. The solution would be to go through all permutations of any number of letters to find the choicest combinations, and those strings having just four characters in length would be in a field of 456976 possibilities. Happy hunting!