68k Latin to English Translator (Dev Thread)

Sam · Power User (Posts: 441)

The following documentation has been quoted for posterity, but it is not representative of the current state of this program.

Travis · Power User (Posts: 428)

Have you considered using a binary search algorithm for finding words? It may help a bit. (Though only if you're using ASCII, as AMS's string comparison operators are buggy with character codes >127, such as accented letters.) Even so, I suspect the list lookup routines on 68K calcs not to be particularly efficient; it probably does a linear scan through the entire list in order to find the nth item.

Another consideration you may want to be aware of before you get too far is that in my experience, TI-BASIC has problems as soon as a variable grows beyond a certain size. Technically there is a hard limit of about 64K per variable, but you may encounter memory errors working with variable sizes of only half of this or less. When I wrote programs that stored large datasets in lists and matrices, I ended up having to break them up into several variables. If you separated your database into separate variables in some way (for instance, one for each starting letter of the word), this may improve performance and defer any memory issues you might encounter if the dictionary grows large. The # (indirection) operator or the expr() function come in handy when using schemes like this.

Sam · Power User (Posts: 441)

That would indeed take a while! If I were to approximate it off the top of my head, I'd say with the vocabulary and grammar knowledge required, combined with the Aeneid's massive length of 9896 lines of dactylic hexameter, it's probably a 9 month job or so.

Europa · Advanced Member (Posts: 214)

That would indeed take a while! If I were to approximate it off the top of my head, I'd say with the vocabulary and grammar knowledge required, combined with the Aeneid's massive length of 9896 lines of dactylic hexameter, it's probably a 9 month job or so.

Sam · Power User (Posts: 441)

That would indeed take a while! If I were to approximate it off the top of my head, I'd say with the vocabulary and grammar knowledge required, combined with the Aeneid's massive length of 9896 lines of dactylic hexameter, it's probably a 9 month job or so.

Europa · Advanced Member (Posts: 214)

That would indeed take a while! If I were to approximate it off the top of my head, I'd say with the vocabulary and grammar knowledge required, combined with the Aeneid's massive length of 9896 lines of dactylic hexameter, it's probably a 9 month job or so.

Legoman314 · Member (Posts: 145)

Could you try getting around this by having 26 lists, one for each letter, so that each list is much shorter than the original long one. Not sure if this would make a difference or not. Keep up he good work!

Sam · Power User (Posts: 441)

Could you try getting around this by having 26 lists, one for each letter, so that each list is much shorter than the original long one. Not sure if this would make a difference or not. Keep up he good work!

Sam · Power User (Posts: 441)

Have you considered using a binary search algorithm for finding words? It may help a bit. (Though only if you're using ASCII, as AMS's string comparison operators are buggy with character codes >127, such as accented letters.) Even so, I suspect the list lookup routines on 68K calcs not to be particularly efficient; it probably does a linear scan through the entire list in order to find the nth item.

CodertheBarbarian · Member (Posts: 126)

I would probably implement the word storage using linked lists. You can find some routines for linked list in Michael Abrash’s Graphics Programming Black Book, the section entitled linked list (written in C, of course, but they're pretty easy to understand.)

Here are my own thoughts.
First, you need a format for the links between records. Since you're doing it with many separate lists that you want to use as general memory, I'd recommend doing it as something like this: XX###, where XX is a two digit list name, and ### is the list entry. (The indirection operator makes this a lot easier...) Second, to make sure it actually can find them alphabetically quickly, I would have a list that simply stores links to the first word alphabetically of a certain letter (I.E. the word that starts with C that would alphabetically come before all of the other words that start with C.) To make sure memory is reused if you ever have to remove entries, I'd recommend having a couple more lists: one that contains a set of orphaned links (i.e. ones that have been deleted but are in the middle of a list.) You'd also probably want another list that stores the next set of links for each list. (The next location you can store to in each of the lists.)

I'd recommend using linked lists for this because it's so much easier to add records in the middle of a list and such. If you need to look at the routines, take a look at this: http://www.jagregory.com/abrash-black-book/#linked-lists.

Sam · Power User (Posts: 441)

This also effectively solved the issue of scaling to keep the memory safe, because it's trivial to split wrda into multiple lists of 500 or so elements and just use # to point the program to different ones, or even split wrdindex into multiple strings.

I'll have the calc generate that string now, and I'll get back with results.

Travis · Power User (Posts: 428)

https://en.wikipedia.org/wiki/Binary_search_algorithm

The basic idea is to compare the item to find with the middle value in a list, and assuming that the list is sorted, we know that if the items don't match then the item we're looking for must be either in the first or second half of the list depending on whether the middle item is less than or greater than what we were searching for. So we take that half of the list and repeat the process, checking the middle item in the half, breaking that down into another half-list and so on, until the item is either found or we've checked two adjacent items in the list and determine that the item isn't there at all since it would have been between those two items.

Sam · Power User (Posts: 441)

The verb conjugations are in the wrda metadata, so the program will always know what a given conjugation is.

Europa · Advanced Member (Posts: 214)

The verb conjugations are in the wrda metadata, so the program will always know what a given conjugation is.