I'm creating a C program for a TI89 Titanium that plays back an encoded video. The video is larger than 64kb (after encoding in most optimal way possible), therefore I know I'll have to use multiple files. I have tried multiple ways to do this:

    Each file is a program, where you run the programs in the correct order in another controlling program. This works, but only for the first two programs, as after the third one is ran, it will state "Invalid program reference." I have found that this means the calculator is out of memory (188 kb memory total, 64 kb * 3 > 188 kb).

    Each file is still a program, but instead of running them, use fopen to read the data they contain. When using a hex editor on the compiled 89z programs, the video data shows up exactly as stored in the C program. This data would be used by a single running program that would reuse memory. However, when trying to read this data on the calculator, the data is the same for the first few bytes, but then is completely different.


I just want a solution to one of these problems, or a something that might work. For the first one, maybe there a way to clear the memory after the program has ran. All i know is that the sub programs are not removed from memory until after the controlling program has finished. I couldn't find any functions regarding clearing ram on the TIGCC documentation. For the second, maybe the program is compressed after being transferred to the calculator. I am not sure. I have also been wondering if the App signing keys are anywhere, I've read about them being figured out by the community, however, I have not seen them anywhere. I know that apps do not have the same 64kb limit as programs, so I could probably switch to 68k if that is available. Regardless, any help/ideas are welcomed.
(I posted this on TI calculators subreddit a while ago, but no one replied.)
I'll go into more detail about this later (it's now time for me to go to work ^^), but I'd say that you need to change your approach, as in the general case, launching a program from another is pain. There are two main options:
* a single program which uses libraries, either DLLs ( https://debrouxl.github.io/gcc4ti/htdll.html , https://debrouxl.github.io/gcc4ti/dll.html ) or the more powerful dynamic libraries offered by "kernel"s (but then your program will require PreOS);
* a FlashApp, but then you need to put up with the sub-par compiler from the Sierra C toolchain packaged with TIFS, and sign the FlashApp (the keys are indeed somewhere).

Back in the day, I did a PoC, http://tict.ticalc.org/downloads/launchmultiple.tar.bz2 .
Thanks for the quick reply. Will try to understand once you go into more detail.
From looking at what you're trying to do, I think you would want to keep the encoder program small and store the data in large files in flash memory. Then, open those for reading and read the data out (TI89t supports text and/or binary files; no need to make them programs)
If binary files are indeed supported, please let me know how to convert from .bin to a readable format for the TI89T. I tried to figure this out but could not.
Also, can i just use fopen with these files? As I said, i tried doing this with the programs, but they were not the same as they showed up in a hex editor.
* arbitrary binary data (of size <= 65518 bytes per file, that is) can be wrapped into computer-side files, suitable for transfer to a TI-68k series calculator (using TI-Connect or TILP), with tools such as ttbin2oth from the TI-68k Developer Utilities. ttbin2oth is part of GCC4TI ( https://github.com/debrouxl/gcc4ti/tree/next/trunk/tigcc/tools ), as such tools should be, and the standalone version can be downloaded from http://tict.ticalc.org/downloads/tt140.tar.bz2 ;

* you can indeed use functions from <stdio.h> to access files, but using the native functions from <vat.h> (SymFind, DerefSym, SymFindFirst, SymFindNext, etc. - see https://debrouxl.github.io/gcc4ti/vat.html ) would enable you to benefit from direct access to the memory-mapped files (like you'd do on a "big" computer after mmap() on *nix, or MapViewOfFile on Windows), which is much more efficient, and takes less space in your program.

Programs from TICT typically use native filesystem (VAT) access, and at least two of them use ttbin2oth as part of their build process:
* TICT-Explorer:

Code:
ttbin2oth -quiet -92 "LIB" tictexpl.bin tictexpl
ttbin2oth -quiet -89 "LIB" tictexpl.bin tictexpl

* FAT-Engine:

Code:
ttbin2oth -89 dll fat.bin fatlib
ttbin2oth -92 dll fat.bin fatlib
ttbin2oth -89 dll fat.bin fatlib_c
ttbin2oth -92 dll fat.bin fatlib_c
Thanks for the great utilities. This did solve my original problem. Though, I am using fopen, as it is what I know how to do. You mentioned the <vat.h> functions, I took a look at the documentation, but was unable to see how I would use them (I am still very new to this). If using <vat.h> is more efficient and faster, then I would like to better understand it.

I noticed that when I was still using programs, they were loaded into memory very quickly. I also noticed in the <vat.h> documentation EM_twinSymFromExtMem and SymDelTwin. Is this what I need to use?

Thanks for all the help thus far.
Nope, the twin symbol functions are used internally by AMS as part of the process of launching assembly programs, but they're seldom used by external programs. The usual program launchers (program-specific pstarter, generic ttstart, or the SuperStart FlashApp) do not use them. TICT-Explorer does, though.

You can find usage examples for the 4 functions I mentioned in my previous message inside many TI-68k/AMS programs. Two simple examples which have been packaged with the toolchain since pre-GCC4TI times:

https://github.com/debrouxl/gcc4ti/blob/next/trunk/tigcc/examples/Handle%20a%20variable%20with%20VAT%20functions.c
https://github.com/debrouxl/gcc4ti/blob/next/trunk/tigcc/examples/List%20variables%20and%20folders.c

Among the TICT programs which use them, I can see TI-Chess (tichess), TICT eBook Reader (ebook), TICT-Explorer (both tictex and tictexpv), TI-Miner and TI-Puzzlize.

For SymFind + DerefSym, simplified from TI-Puzzlize's load.c:


Code:

SYM_ENTRY* arc_symptrs[10] = {NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL};
short      arc_entries[10] = {0,0,0,0,0,0,0,0,0,0};

    unsigned char *src;
    char           tmpstr[100];
    short          loop;
    char* fname = "puzpics";

    for (loop=0;loop<10;loop++) {
        tmpstr[0] = 0;
        sprintf(tmpstr+1,"%s%d",fname,loop);

        arc_entries[loop] = 0;

        if (!(arc_symptrs[loop] = DerefSym(SymFind(tmpstr+9)))) {
            //printf("%s not found\n",tmpstr+1);
            //ngetchx();
            continue;
        }

        src = HLock(arc_symptrs[loop]->handle);
        src+=2; // src is a pointer to the beginning of data contained in the file.
...
        HeapUnlock(arc_symptrs[loop]->handle);


TI-Chess, TICT-Explorer (both tictex and tictexpv) and TICT eBook reader use iteration through the VAT with SymFindFirst & SymFindNext.
Thanks Smile
The final bottleneck of my program now is drawing. I looked for faster ways to draw horizontal lines and came across a TICT tutorial about this exact topic. I made it work for horizontal lines only, however, it was still slower than the DrawLine method. I read on the FAQ for gcc4ti that the default DrawLine method is optimized for horizontal/vertical lines. Is there a faster way to draw to the screen? The data is encoded in such a way that drawing lines is the most convenient way, but would it be worth it to convert it to something else in code? (Maybe sprites? I've tried DrawMultiLines, BitmapPut and FillLines, but they seemed slower)
In general, AMS's entire graphics stack is slow, partially because it tries to deal with non-screen-sized planes. Faster drawing on screen-size (or at least screen-width) buffers is provided by specialized graphics libraries such as TICT's ExtGraph, whose latest version is available from https://github.com/debrouxl/ExtGraph .
The ExtGraph function dedicated to horizontal lines is FastDrawHLine_R ( https://github.com/debrouxl/ExtGraph/blob/experimental/src/lib/Line/FastDrawHLine_R.s ). However, in order to reach optimal speed, it needs to be further tailored to your needs: for instance, inlining into your code, folding address computations, removing of the drawing modes you don't need, probably shortening the handling for short lines, etc.

For the video content you want to draw to be amenable to horizontal lines drawing, it must have pretty special visual form ? In which case, if you've got many lines short enough to fit in one or two word writes, the speed-optimal way to handle them would be to generate code on the computer side and execute it on the calculator side: ori.w #0xnnnn,d(an) / ori.l #0xnnnnnnnn,d(an) instructions for turning pixels on (eori for reversing them, andi with negative mask to turn them off) would be respectively 6 and 8 bytes, and would be very much faster (and smaller) than a call to an out of line function.
However, for a 240 x 128 screen, horizontal lines can be encoded in 3 bytes, whether it's (1 byte x1, 1 byte x2, 1 byte y) encoding or (2 bytes number of pixels from the top left corner to the first left pixel of the row, 1 byte number of pixels in the row), so generated code ought to be larger.
Thanks, the ExtGraph function works good, even without the optimizations you suggested. I'm not sure how to go about doing the optimizations as I know very little assembly (I went through some code explanations on TechnoPlaza a while ago, but never really put it to use), and also just learned C for this about two weeks ago.

If you are asking about the encoding, then video is encoded as follows: First two bytes are the length (usually 64kb), and the rest are byte x-positions. Every frame is started assuming the color is black, and then keep track of every X position where the line (going from left to right) changes color (it can be zero if it isn't black). Also keep track of the end of each horizontal line (for this I use 255), however in some cases this step is not necessary (when the pixel on the next line < the last pixel of the previous line). This is basically it. I came up with it as an alternative to run length encoding. I think it is pretty optimal for two colors, but maybe there is more that can be done to compress it.

In the second paragraph, are you suggesting that I put all the instructions in these files instead, and then run them rather than draw them? Again, I'm new, so I was a little confused by the terminology used in some of your response. I would be happy to attempt further optimizations and learn more about how it would be done.

I'm only making this for the Ti89, so 160x100 screen is all I want to work.

For a constant framerate I am using the timer and setting the PRG rate to the highest. BTW, do you know what OSC2 is on this https://debrouxl.github.io/gcc4ti/intr.html#PRG_setRate ? I think it is around 10000, but not sure.
More later, but I'll reply to the end of your post first Smile

The 89 / 92+ / V200 / 89T screen buffers all have the same 240-pixel / 30-byte width; supporting the 160x100 portion of screen makes it possible to draw only 2000 bytes out of 3840, which saves ample time.

OSC2 is the second hardware oscillator. See https://github.com/debrouxl/tiemu/blob/master/tiemu/trunk/docs/ti_hw/misc/J89hw.txt and its http://tict.ticalc.org/docs/J89hw.txt mirror for more details about the TI-68k series' hardware. The documentation for HW3+ (89T) port range 710000-7100FF is missing from that document, but you don't need to fiddle with the new RTC or USB hardware for your purposes Smile
On the highest increment rate for port 600017 and the highest starting value, the timer should be triggered at ~8192 Hz on HW2+. At this rate, the mere fact of handling interrupts and returning takes a non-neglectable slice of the CPU power.
Thanks. Here's a peak at it using default ExtGraph: https://www.youtube.com/watch?v=4rH-GwKOmh8 .
It's 10 FPS and 160x100. Not bad, but maybe there can be more FPS with a few optimizations. Smile
10 FPS isn't bad indeed, all the more back in the day, the saying went that the screen's blurriness made it uninteresting to reach more than ~12 FPS. Code generation would make it possible to reach the uninteresting area, at high size cost... should indeed not bother doing that Smile

The main optimizations would be inlining to remove branches and returns, removing handling for the A_XOR mode + specializing one side for the pixel setting + one side for the pixel clearing (which would remove the mode argument and therefore the need to push, test and pop it onto the stack), removing the initial handling for reversed arguments, keeping a copy of the plane line offset (30 * y) instead of re-computing it every time, etc.

Could you post the code somewhere ?
Yeah, I think maybe just the easy to add optimizations are worth implementing. Was thinking about going overboard with the encoding, but figured what I have now is probably nearly as good as it gets. Smile

Here is the C code: https://pastebin.com/ZkpDQSW3 . This, when compiled, is ran in the same directory as all the bin files (the video data). The project includes this C file, a header with only the horizontal line draw defined, and the archive of extgraph. It does not have the optimizations yet (time to sleep SLeep ).
That's an alright first step for a high-level program. However, fixing e.g. the side effects on the system timers (e.g. the reason why you need to set the APD timer interval at huge values) requires a major architectural change, namely switching from [system timers + system keyboard reading] to [own interrupt handlers + low-level keyboard reading + low-power mode], so I'd have to spend a while on the matter...

In general, very few calculator models (all manufacturers included) have hardware binary floating-point units, so calculator programs should not use floating-point computations, unless they need to Smile
It's a good thing the floating-point computations, which are software-emulated on this platform (and with a BCD floating-point format, at that), are only run about once per frame, instead of once per line, otherwise they'd really kill performance.
Thanks for the tips.

For now, I'll remove the key inputs, as they were not very useful anyway.

I attempted to convert to interrupts, however, it doesn't work properly. It's probably made clear by the code/comments that I don't know much about interrupts: https://pastebin.com/winMDy00
Key difference with this code is all the data pointers are put in a global array at the beginning, then the interrupt is setup, a lot of variables are now volatile and global, and the inside of the interrupt has code to draw a single frame.
Here is how this runs: https://imgur.com/a/9OGLnn3

Based on your response that it would require major changes, I don't think I changed enough. Smile
Or it could just be a silly mistake if this should work, though I doubt this.

I've also removed the float math to further simplify for now.
* you need to stop using the system timers, and disable AUTO_INT_1, which can be done using OSSetSR(0x0200), or shorter, asm volatile("move.w #0x0200,%d0; trap #1");. Use 0x0000 to restore at the end of the program.
* consequently, you need to switch to _rowread (boo _keytest, in that case) to check whether a key was pressed. This can be done in a single invocation with a wide mask: _rowread(0x0000) & 0xFF. ISTR that F000 instead of 0000 works on all models as well, but FF00 does not.
* you could probably leave the default rate for the PRG, reduce the starting value to reach ~10 Hz for now, and perform the entire drawing work into and below the interrupt handler. Later optimization will make it possible to raise the initial value again.
* most of the variables you moved to global scope and declared volatile should be local variables inside the interrupt handler, since they're not used by the main, and volatile takes a toll on performance: requires storing the variables in memory and reloading them every time.
Wow, I did all of this, and was very surprised by the improvements. With those three implemented, it runs easily at 16 FPS (and probably more). Definitely a noticeable step-up. Thanks for your help.

Here is the current code: https://pastebin.com/BydBB6Lm
With this new code comes a few more questions. Smile

    Volatile:
    Is my use of volatile necessary? Why does it persist between program runs?

    Inputs:
    Is there a better way to check for inputs using rowread? I'm assuming that only one key is pressed at a time (which is pretty reasonable for this program), but what if there were multiple?

    FPS/PRG stuff:
    I have some math in a comment near the asm volatile stuff. I tried to calculate the value for the PRG start, and got a constant 59? Why does this constant show up? And if I use lower PRG rate, how can I use this for more custom framerates?

    Interrupts:
    I was surprised by how much switching to interrupts helped performance. Why does the program run so much better?


Again, thanks for all your help. Smile I'm still happy to do some more optimizations if there are still some to do.

(the issue with my previous code was that I used short for a variable when I should've used char
Red face glad I didn't spend too long figuring this out)
Good to see that there was a significant speedup Smile

* for unarchived programs, global variables do indeed retain their values across program runs. You'll have to reinitialize some of them.
* _rowread can work one keyboard row at a time, so you can check 8 keys at once. Of course, the keyboard layout is different between the 89/89T and the 92+/V200, so when one needs to check many keypresses in a row, that's where _keytest / _keytest_optimized comes in. I'm not entirely a fan of its linker-based calculator-dependent "constants", though.
* I'll look at your math later, time to go to work;
* the OS's AUTO_INT_1 handler is responsible for both handling OS timers and reading the keyboard in such a way that functions like kbhit() can work. Additionally, see the bit about silent linking (checking the link port) in the kbhit() documentation, https://debrouxl.github.io/gcc4ti/kbd.html#kbhit . So I'm not surprised there's a measurable speed gain, although I wouldn't necessarily have expected it to be so large.

Later, we can use low-power mode based on port 600005, in order to remove the busy-loop before GKeyFlush().
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 2
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement