- 68k Latin to English Translator (Dev Thread)
- 23 Apr 2019 07:49:32 am
- Last edited by Sam on 05 Feb 2020 11:49:06 am; edited 13 times in total
Link to Github repo
Link to progress update 4/29/19
This is going to be a very niche use case, but ah well, I'm gonna post it anyway. As the title states, I am well underway into making a translator from Ciceronian Latin into standard English on a Voyage 200. As far as my influence for making this project, I have to admit it was mostly spite (a terribly good motivator) as both my classmates and my Latin teacher said it was impossible.
"ImPoSsIbLe"
The lateng() Function
The program is heavily abstracted to allow for grammar rules to be easily added when necessary. At its highest level, the program is divided into a grammar organizer function, and a word identifier function. The program (function lateng()) takes a single string input.
Code:
The following documentation has been quoted for posterity, but it is not representative of the current state of this program.
The identify() Function
The identify() function is a very versatile function which will translate any Latin word contained in its libraries into English and necessary data. It takes a latin word as input, and returns a list containing various data depending on part of speech.
I should note that I am aware that textbook Latin is far removed from real Latin, but this program will translate either just fine. Real Latin might be much more difficult for a person to translate, but the program does not care. Real Latin’s excess of grammatical exceptions and obscure rules is, counterintuitively, quite trivial to code in. (When() statements ftw!) This program will translate real Latin exactly as easily as it translates garbage textbook Latin.
Code:
As well as taking a string input, identify() uses two massive library lists (EDIT: It now uses three.) to determine word translations, wrda and wrdb. wrda contains strings holding Latin word bases, as well as various part-of-speech-specific data.
Code:
wrdb is much simpler, giving simple English translations with no appended data. Below are corresponding wrdb entries to the wrda entries above.
Code:
The following documentation has been changed slightly since its conception. View the changes here.
The identify() command uses inString() to look for Latin bases within your input string. For instance, identify() will decide the input string "aestatem" will match "1320aestat aestas " because "aestat" is a substring of "aestatem". For wrda entries with certain parts of speech, the input string must be identical to the base for identify() to recognize it. For instance, identify() knows that "ad" means "to" but "adeiufosdfv" does not.
Once identify() matches the input string to a word (or doesn't, in which it returns null) it then checks to see if the part of speech requires an ending beyond the base (with exceptions, such as irregular verbs and irregular nominative singular nouns). If it does, if chops the ending off the input string and runs it through every possible ending for that word, and adds the results to a list of possible endings. For words which have an ending that can be translated in multiple ways, it simply adds it to the list. Below are some examples of what identify would return.
Code:
Thus this program will translate any Latin thrown at it, granted that it is contained within the word library. This is the biggest limitation, because I have to log all the words by hand into my Voyage 200. I've made a shell program to assist me, but it's still tedious. Another annoyance is the fact that lists are not random access, meaning it takes significantly longer to identify "villa" than it does to identify "ager" (wrda is sorted alphabetically). With these in mind, the calculator ends up taking around three minutes to translate a full sentence. "In pictura est puella" took a minute and a half. I'll put the source code up whenever I get around to it.
Link to progress update 4/29/19
This is going to be a very niche use case, but ah well, I'm gonna post it anyway. As the title states, I am well underway into making a translator from Ciceronian Latin into standard English on a Voyage 200. As far as my influence for making this project, I have to admit it was mostly spite (a terribly good motivator) as both my classmates and my Latin teacher said it was impossible.
"ImPoSsIbLe"
The lateng() Function
The program is heavily abstracted to allow for grammar rules to be easily added when necessary. At its highest level, the program is divided into a grammar organizer function, and a word identifier function. The program (function lateng()) takes a single string input.
Code:
lateng("In pictura est puella.")
The following documentation has been quoted for posterity, but it is not representative of the current state of this program.
Sam wrote:
Immediately, the program splits the input into a list containing a word per element, and corrects all cases to lowercase and removes punctuation.
Code:
The program them replaces each element with the translated version of the word, using the identify() command, which I'll cover below.
Code:
The program finally applies necessary articles, capitalization and punctuation, the concatenates the list.
Note: This only works for crude Latin that is already in English word order. The remaining work I need to do entails rewriting lateng() to translate Latin in any word order. I've been focusing on identify() more than anything else so far.
Code:
Code:
"In pictura est puella."
{"in","pictura","est","puella"}
The program them replaces each element with the translated version of the word, using the identify() command, which I'll cover below.
Code:
{"in","pictura","est","puella"}
{"in","picture","is","girl"}
The program finally applies necessary articles, capitalization and punctuation, the concatenates the list.
Note: This only works for crude Latin that is already in English word order. The remaining work I need to do entails rewriting lateng() to translate Latin in any word order. I've been focusing on identify() more than anything else so far.
Code:
{"in","picture","is","girl"}
"In the picture is a girl."
The identify() Function
The identify() function is a very versatile function which will translate any Latin word contained in its libraries into English and necessary data. It takes a latin word as input, and returns a list containing various data depending on part of speech.
I should note that I am aware that textbook Latin is far removed from real Latin, but this program will translate either just fine. Real Latin might be much more difficult for a person to translate, but the program does not care. Real Latin’s excess of grammatical exceptions and obscure rules is, counterintuitively, quite trivial to code in. (When() statements ftw!) This program will translate real Latin exactly as easily as it translates garbage textbook Latin.
Code:
Verb: {part of speech, English translation, conjugation, tense, person, singular/plural}
Noun: {part of speech, English translation, declension, case, gender, singular/plural}
Adj: {part of speech, English translation, declension, case, gender, singular/plural}
Pro: {part of speech, English translation}
Conj: {part of speech, English translation}
Prep: {part of speech, English translation}
Adv: {part of speech, English translation}
Int: {part of speech, English translation}
As well as taking a string input, identify() uses two massive library lists (EDIT: It now uses three.) to determine word translations, wrda and wrdb. wrda contains strings holding Latin word bases, as well as various part-of-speech-specific data.
Code:
//Various entries contained in wrda:
//The first digit always denotes part of speech, while the following digits are specific to part of speech. The digital data always takes up four characters, even if the word does not use them.
"1320aestat aestas "
//The 1 means this is a noun, the 3 means the noun is third declension, the 2 means it is feminine, and the 0 means it is not living. The noun has two bases, meaning the second base is the nominative singular of the word and the first base is the base for the remaining words.
"5xxxad "
//The 5 means this is a preposition, but since prepositions have no extra data, the unused characters are filled with x. The Latin word is "ad"
"0100ambul ambulare ambulav ambulat "
//The 0 means this word is a verb, the 1 means the verb is first conjugation, the second 0 means the verb is neither irregular nor does it take dative nouns, and the third 0 means the verb is not deponent. Because it is not irregular, the verb has four principal parts,the first being the present base, then the infinitive base, then the perfect base, then simply the fourth principal part, used for perfect, pluperfect, and future perfect verbs that are passive.
wrdb is much simpler, giving simple English translations with no appended data. Below are corresponding wrdb entries to the wrda entries above.
Code:
//Various entries contained in wrdb:
"summer "
"to "
"walk walking walked "
//Verbs are unique in that they contain three Latin translations corresponding to various tense/person/number combinations.
The following documentation has been changed slightly since its conception. View the changes here.
The identify() command uses inString() to look for Latin bases within your input string. For instance, identify() will decide the input string "aestatem" will match "1320aestat aestas " because "aestat" is a substring of "aestatem". For wrda entries with certain parts of speech, the input string must be identical to the base for identify() to recognize it. For instance, identify() knows that "ad" means "to" but "adeiufosdfv" does not.
Once identify() matches the input string to a word (or doesn't, in which it returns null) it then checks to see if the part of speech requires an ending beyond the base (with exceptions, such as irregular verbs and irregular nominative singular nouns). If it does, if chops the ending off the input string and runs it through every possible ending for that word, and adds the results to a list of possible endings. For words which have an ending that can be translated in multiple ways, it simply adds it to the list. Below are some examples of what identify would return.
Code:
identify("aestatem")
{"Nou","summer","Acc","Fem","Sin"}
//This noun means summer, it is in the accusative case, it is feminine, and it is singular.
identify("aestatibus")
{"Nou","summer","DatAbl","Fem","PluPlu"}
//This is the same noun with a different ending that has multiple meanings. It can be dative or ablative plural.
identify("ambulati eramus")
{"Ver","had been walked","1st","Plu","Pas","1st","Plu"}
//This verb translates as "had been walked", is 1st conjugation, pluperfect tense, passive voice, 1st person, and plural.
identify("ad")
{"Pre","to"}
//This preposition simply means "to".
Thus this program will translate any Latin thrown at it, granted that it is contained within the word library. This is the biggest limitation, because I have to log all the words by hand into my Voyage 200. I've made a shell program to assist me, but it's still tedious. Another annoyance is the fact that lists are not random access, meaning it takes significantly longer to identify "villa" than it does to identify "ager" (wrda is sorted alphabetically). With these in mind, the calculator ends up taking around three minutes to translate a full sentence. "In pictura est puella" took a minute and a half. I'll put the source code up whenever I get around to it.