My math was just an attempt to automate getting the PRG start value based on a given FPS. For 16 FPS, it doesn't seem like there is a PRG start value that will give exactly 16 hz, as I noticed today that 195 for the start value is too slow, and 196 for the start value is too fast. Seems like another approach is required.

Regardless, here is the (incorrect) math:

Code:
//solve(255 - ((8192 / 2^9) * x / 16) = 196, x) = 59, where 196 gives results close to 16hz)


I'm still a little confused on how the PRG works. I know the default rate is OSC2 / 2^9. Plugging in 8192 for OSC2 from what you gave earlier, evaluating 8192 / 2^9 gives 16. I am unsure how to interpret this rate number. It does not seem to be hz, as wouldn't the PRG value go up by 16 every second in that case? (this is not what I see happening with printfs).

I looked into low power mode a little bit but was unsure how to use idle(). Would be happy to remove that while loop. Smile
Per the J89hw.txt file I linked above, OSC2 has ~2^19 Hz rate on HW2+, and a higher, significantly variable frequency on HW1, which is why the old-style RTC used by the 89, 92+ and V200, based on AUTO_INT_3 (*).
The standard rate and initial value for triggering AUTO_INT_5 are OSC2 / 2^9 = 1024 (on HW2+) and 257 - 53, which yields ~19.32 Hz, and an APD timeout of ~310-311" (measured per 1 Hz RTC rate on my calculator, anyway). Using an initial value of 257 - 51 cuts the APD timeout to ~299", which is closer to the target 300".
The OSC2 / 2^5 fast rate yields 2^14 = 16384 (HW2+), but from memory, the hardware needs to increment the internal counter exposed on port 600017 at least once for the interrupt to be triggered, so the initial value needs to be 1 lower - hence the 8192 value Smile

idle() is the elaborate, more thorough and therefore slower way to idle. Many programs do not use it because it interferes with grayscale, but that's not an issue here. A quick job using port 600005 shall be a good first step.

*: it's the reason why I used 0x0200 instead of 0x0400 for the interrupt mask above. Programs used the latter for years but had to change to the former after TI started providing the RTC functionality in V200 AMS 2.07 and 89/92+ AMS 2.08.
Got it. Smile

I've attempted to make the playback FPS more accurate, however, my attempt seems quite inefficient. My way is just to set PRG rate to 0, then have a check at the start of the interrupt that checks if a counter is at a certain number, if it is then reset that number and continue with the frame. Otherwise, just add to the counter. This attempt hurts performance (I would guess maybe because it has to preload the code every time? not sure). My goal for now is to be nearly 16FPS, right now its about 16.8FPS, which is quite noticeable over time.

Also, the use of ~8192 means approximately, correct? Does this mean that it varies from calculator to calculator, or was it just uncertain at the time of writing? If it isn't exact, can I find the exact value with a program?

I've tried to use low power mode instead of the while loop, however it only worked in some situations. For example, with a PRG start of zero and PRG rate zero, the low power mode would work. With the default rate though, the calculator would awaken immediately and the program would finish. Here are all the ways I tried:

Code:
idle(); // Immediately awakes
poke(600005,0); // Immediately awakes
asm volatile("move.b #0x00,600005;"); // Immediately awakes
while (currentFile < TOTAL_STR); // Works (original)

I was just replacing the while loop with one of the other options. Am I missing something else that is needed with low power?
Oh, my bad. It's not a matter of removing the loop, but a matter of putting the write to port 0x600005 into the loop. I should have written the hexadecimal prefix, though the addresses for hardware ports are always implicitly given under hexadecimal form. 600005 decimal points to nowhere on the 89T, and to a ghost of the RAM on older models - so on older models, writes to 600005 decimal are likely to be memory corruption Smile

The "~" in front of 8192 Hz follows the convention used in J89hw.txt ("HW2+: ??? - (~520 kHz (= 2^19 Hz !)) - ???" line): the OSC2 rate is hardware-defined, so it isn't exactly 2^19 +/-0 Hz on HW2+, though it is supposed to be reasonably accurate (but I don't know how much). Far more than on HW1, anyway Smile
The other usable timers based on AUTO_INT_3 (89, 92+, V200) and AUTO_INT_1 (all models) are derived from the same OSC2 timer, so that's probably not the way to find the exact rate. I don't know whether the HW3+ RTC is derived from OSC2, J89hw.txt was written years before the 89T came out and received only minor updates afterwards, AFAICT.

Since you're interested in waking up the CPU upon AUTO_INT_5, the value which needs to be written to port 0x600005 is 0b10000, BTW.
All good. I had figured out that the bits were the "arguments", but not that I needed to put it in the loop. Also thought the address would be interpreted as hex, so thanks for clearing this up.

Here is what my while loop looks like now. I've attempted to do something similar to what I previously had with the float thing, but without float math.

Code:
   PRG_setStart(196); // If start was 196 for all frames, actual playback speed: 16hz * 1.0069 (found by experiment)
   PRG_setRate(1);
   
   unsigned char frameCounter = 0;
   while (currentFile < TOTAL_STR) {
      asm volatile("move.b #0b10000,0x600005;");
      frameCounter++;
      if(frameCounter == 73) { // Calculated by: (1/(1.0068 - 1)) / 2 = ~73.5 (note: i dont know why divide by 2 shows up)
         PRG_setStart(195);
      } else if(frameCounter == 74) {
         frameCounter = 0;
         PRG_setStart(196);
      }
   }

This doesn't account for the 0.5 in my calculation. But it is somehow pretty close. Any suggestions to improve this?

Also, I've tested this:
Quote:
The OSC2 / 2^5 fast rate yields 2^14 = 16384 (HW2+), but from memory, the hardware needs to increment the internal counter exposed on port 600017 at least once for the interrupt to be triggered, so the initial value needs to be 1 lower - hence the 8192 value Smile

It seems that by setting the start of the PRG to 0 and the rate to 0, I can get it to increment a counter approximately 16000 times per second. The interrupt seems to run without an increment in this case. It is about 8192 whenever I set the start to 255 (in this case, the counter needs to be incremented/overflowed). Smile
A start value of 196 means 257 - 196 = 61 increments, which would be expected to yield an AUTO_INT_5 rate of 1024 / 61 ~ 16.787 Hz. 16 * 1.0069 ~ 16.11 Hz , so it looks like the timing frequency is a bit off on your calculator, or there's something else going on.

ACK about being able to trigger AUTO_INT_5 at a rate close to 16384 Hz. Maybe I misremembered, or maybe the 89T doesn't behave the same as older models. Anyway, that's not the way to go here.

Could you post at least the first data file for testing purposes ? I'd like to work a bit on the code, to implement my previous suggestions, and more, but I won't go very far without being able to perform proper tests Smile

EDIT 2:
* BTW, in the video, I see hundreds of frames on which multiple top rows are made of only white pixels, or only black pixels. That's less frequent with bottom rows, but it happens as well.
I think that it's worth special-casing these full-white or full-black lines, so as to be able to use just about 6 instructions to write them (moveq #0/#-1,dn; move.l dn,(an)+; move.l dn,(an)+; move.l dn,(an)+; move.l dn,(an)+; move.l dn,(an)+), and thereby save more power by sleeping more until the next frame.
* how many full-white and full-black frames are there in the video ?

EDIT 3: actually, "here are N consecutive full-white/black lines at the top" and "here's a full-white/black frame" are just special cases of "here are N consecutive white/black lines". memset(LCD_MEM, 0x00/0xFF, HEIGHT * LCD_WIDTH) are slower than HEIGHT iterations of a loop based on the aforementioned full line draw, besides. So I'd think that repurposing opcodes 253 and 252 as prefixes for 1 byte containing the number of consecutive white / black lines would be a good thing, all the more it would solve the flashing (potentially seizure-triggering in addition to being time-wasting, boo me) on the top right part of the screen in the code below.

EDIT 1: here's an entirely untested, and therefore certainly broken, mod of your code, showing part of what I have in mind:

Code:
#define USE_TI89
#define SAVE_SCREEN

#include <tigcclib.h>

#define WIDTH 160
#define HEIGHT 100
#define FPS 16
#define TOTAL_STR 16

// A derivative of FastDrawHLine_R from ExtGraph by TICT.
void SpecialFastDrawHLine_R(void* line asm("a0"), unsigned short x1 asm("d0"), unsigned short x2 asm("d1"), short mode asm("d3"));
asm ("
| C prototype:
| Valid values for mode are: A_REVERSE, A_NORMAL, A_REPLACE, A_OR (A_XOR removed).
|
| This routine draws a horizontal line from (x1) to (x2) on the given line address.

.text
.even
0:
.word 0xFFFF,0x7FFF,0x3FFF,0x1FFF,0x0FFF,0x07FF,0x03FF,0x01FF,0x00FF,0x007F,0x003F,0x001F,0x000F,0x0007,0x0003,0x0001

1:
.word 0x8000,0xC000,0xE000,0xF000,0xF800,0xFC00,0xFE00,0xFF00,0xFF80,0xFFC0,0xFFE0,0xFFF0,0xFFF8,0xFFFC,0xFFFE,0xFFFF

.globl SpecialFastDrawHLine_R
SpecialFastDrawHLine_R:
    move.l   %d4,%a1                         | d4 mustn't be destroyed.

    | Removed: test and fix for reversed x1 and x2.

    | Largely optimized: line address computation.
    move.w   %d0,%d4
    lsr.w    #4,%d4
    adda.w   %d4,%a0

| d4 = 8 * (x1/16 + x1/16) + 16. We add 1 before shifting instead of adding 16
| after shifting (gain: 4 clocks and 2 bytes).
    addq.w   #1,%d4                          | d4 = 8 * (x1/16 + x1/16) + 16.
    lsl.w    #4,%d4

    move.w   %d1,%d2                         | x2 is stored in d2.
    andi.w   #0xF,%d0

    add.w    %d0,%d0
    move.w   0b(%pc,%d0.w),%d0 | d0 = mask of first pixels.
    andi.w   #0xF,%d1

    add.w    %d1,%d1
    move.w   1b(%pc,%d1.w),%d1   | d1 = mask of last pixels.
    cmp.w    %d4,%d2                         | All pixels in the same word ?
    blt.s    4f
    sub.w    %d4,%d2                         | d2 = x2 - x.
    moveq.l  #32,%d4
    tst.w    %d3
    beq.s    0f

| A_NORMAL / A_OR / A_REPLACE.
    or.w     %d0,(%a0)+
    moveq    #-1,%d0
    sub.w    %d4,%d2
    blt.s    5f
6:
    move.l   %d0,(%a0)+
    sub.w    %d4,%d2
    bge.s    6b
5:
    cmpi.w   #-16,%d2
    blt.s    7f
    move.w   %d0,(%a0)+
7:
    or.w     %d1,(%a0)
    move.l   %a1,%d4
    rts

| A_REVERSE.
0:
    not.w    %d0
    and.w    %d0,(%a0)+
    moveq    #0,%d0
    sub.w    %d4,%d2
    blt.s    5f
6:
    move.l   %d0,(%a0)+
    sub.w    %d4,%d2
    bge.s    6b
5:
    cmpi.w   #-16,%d2
    blt.s    8f
    move.w   %d0,(%a0)+
8:
    not.w    %d1
    and.w    %d1,(%a0)
    move.l   %a1,%d4
    rts

4:
    and.w    %d0,%d1
    tst.w    %d3
    beq.s    8b
    or.w     %d1,(%a0)
    move.l   %a1,%d4
    rts
");

static inline SYM_ENTRY * openBinVAT(const char *symptrName) { // Quicker file reader than default fopen. Thanks Lionel
   return DerefSym(SymFind(symptrName));
}

volatile unsigned short currentFile = 0;
unsigned char * dataPtr;
unsigned char * dataBlockEndPtr;
unsigned char * datas[TOTAL_STR];
unsigned short dataLengths[TOTAL_STR];

DEFINE_INT_HANDLER(DrawFrameInt) {
   unsigned char x0 = 0, x1 = 0, currentColor = 0, lastByte = 0; // Initialize vars for writing
   unsigned char * line = LCD_MEM;

   while (dataPtr < dataBlockEndPtr) { // Draw a frame (essentially copied from for loop, with breaks instead of wait for frame)
      unsigned char nextByte = *dataPtr++;
      if (line == LCD_MEM + LCD_SIZE) {
         // No need to reinitialize variables.
         break;
      }
      unsigned char newColor = !currentColor; // Defined here so it doesnt have to be recalculated.
      if (nextByte == 255 || (lastByte < WIDTH && nextByte < lastByte)) { // New horizontal line? Then finish the previous row.
         SpecialFastDrawHLine_R(line, x0, WIDTH - 1, newColor);
         x0 = 0;
         x1 = 0;
         line += LCD_WIDTH;
      }
      if (nextByte < WIDTH) { // This is the normal line draw. Majority
         x1 = nextByte;
         SpecialFastDrawHLine_R(line, x0, x1, newColor);
         currentColor = newColor;
         x0 = x1;
      } else if (nextByte == 254) { // This is if the frame has not changed. Just wait.
         break;
      } else if (nextByte == 253) { // Draw a full black screen, and wait
         memset(LCD_MEM, 0xFF, HEIGHT * LCD_WIDTH); // That draws also pixels we don't need, but it's still much faster than HEIGHT calls to a generic horizontal line drawing function. Will look like crap on the 92+/V200, though. See EDIT 3.
         break;
      } else if (nextByte == 252) { // Draw an empty frame, wait
         memset(LCD_MEM, 0x00, HEIGHT * LCD_WIDTH); // Ditto.
         break;
      }
      lastByte = nextByte;
   }
   short row = _rowread(0x0000);
   if (dataPtr >= dataBlockEndPtr || row == 8) { // TODO portable row reading.
      if (currentFile < sizeof(datas) / sizeof(datas[0])) {
         currentFile++;
      }
      dataPtr = datas[currentFile];
      dataBlockEndPtr = dataPtr + dataLengths[currentFile];
   } else if (row == 2) { // TODO portable row reading.
      if (currentFile > 0) {
         currentFile--;
      }
      dataPtr = datas[currentFile];
      dataBlockEndPtr = dataPtr + dataLengths[currentFile];
   }
}

// currentFile = -1 is invalid.
/*DEFINE_INT_HANDLER(BreakDrawInt) {
   currentFile = -1;
}*/

void _main(void) {
   static const char VARNAME_STRS[TOTAL_STR][9] = {
      "\0ba\\px01", "\0ba\\px02", "\0ba\\px03", "\0ba\\px04",
      "\0ba\\px05", "\0ba\\px06", "\0ba\\px07", "\0ba\\px08",
      "\0ba\\px09", "\0ba\\px10", "\0ba\\px11", "\0ba\\px12",
      "\0ba\\px13", "\0ba\\px14", "\0ba\\px15", "\0ba\\px16",
   };
   SYM_ENTRY *symPtrs[TOTAL_STR];
   short missingfile = FALSE;

   ClrScr(); // Initial setup

   for (unsigned short fileI = 0; fileI < TOTAL_STR; fileI++) { // Set up direct pointers to data while locking memory blocks.
      SYM_ENTRY * symPtr = openBinVAT(VARNAME_STRS[fileI] + sizeof(VARNAME_STRS[0]) - 1); // Define the data using the vat
      if (symPtr == NULL) {
         missingfile = TRUE;
         DrawStr(0, 0, VARNAME_STRS[fileI], A_NORMAL);
         GKeyIn(NULL, 0);
      }
      unsigned char* data = HLock(symPtr->handle);
      data += 2; // Offset from VAT length data

      dataLengths[fileI] = *(unsigned short *)data; // Define the length of the data
      datas[fileI] = data + 2; // Offset from real length data
      symPtrs[fileI] = symPtr;
   }

   if (missingfile) {
      return;
   }

   // Reinitialize global variables.
   dataPtr = datas[0];
   dataBlockEndPtr = datas[0] + dataLengths[0];
   currentFile = 0;

   INT_HANDLER oldInt5 = GetIntVec(AUTO_INT_5); // Save default stuff
   INT_HANDLER onInt = GetIntVec(INT_VEC_ON_KEY_PRESS);
   unsigned char oldStart = PRG_getStart();

   asm volatile("trap #12; move.w #0x0200,%sr"); // Set defaults to new stuff
   SetIntVec(AUTO_INT_5, DrawFrameInt);
   //SetIntVec(INT_VEC_ON_KEY_PRESS, BreakDrawInt);

   PRG_setStart(196); // If start was 196 for all frames, actual playback speed: 16hz * 1.0069 (found by experiment)

   PRG_setStart(196);
   PRG_setRate(1);

   unsigned char frameCounter = 0;
   while (currentFile < TOTAL_STR) {
      asm volatile("move.b #0b10000,0x600005");
      frameCounter++;
      if (frameCounter == 73) { // Calculated by: (1/(1.0068 - 1)) / 2 = ~73.5 (note: i dont know why divide by 2 shows up)
         PRG_setStart(195);
      } else if (frameCounter == 74) {
         frameCounter = 0;
         PRG_setStart(196);
      }
   }
   GKeyFlush();

   asm volatile("trap #12; move.w #0x0000,%sr"); // Restore the program / defaults
   SetIntVec(AUTO_INT_5, oldInt5);
   SetIntVec(INT_VEC_ON_KEY_PRESS, onInt);
   PRG_setStart(oldStart);

   for (unsigned short fileI = 0; fileI < TOTAL_STR; fileI++) { // Unlock memory blocks.
      HeapUnlock(symPtrs[fileI]->handle);
   }
}

Build invocation taking advantage of the toolchain's possibilities and peculiarities (*):

Code:
tigcc -std=gnu99 -O3 -Wall -W -Wwrite-strings -fomit-frame-pointer -mregparm=5 -ffunction-sections -fdata-sections -mno-bss -Wa,-l -mpcrel --optimize-code --cut-ranges -v v4.c -o v4

Resulting binary:

Code:
Program Statistics:
  Program Variable Name:                    main\v4
  Program Variable Size:                    1871 Bytes
  Absolute Relocs:                          0
  Natively Emitted Relocs:                  0
  Relocs Removed by Branch Optimization:    1
  Space Saved by Range-Cutting:             4 Bytes


*: I have long stopped using both the IDE and tprbuilder, except for e.g. TICT-Explorer, which I had already converted to TPRs before I hit their limitations and subsequently contributed an improvement to tprbuilder to make TPRs usable for my (simple) use case. Too bad I had failed to understand the complaints about TPRs posted by some other programmers before I hit hard one of the limitations myself...
Thanks for the suggestion regarding encoding. This resulted in one less file for this case. Funnily enough, I had a comment in my python code about this optimization, but I forgot. Smile

Precisely, the code will run length encode consecutive runs of 255 that are longer than 1, using the value 250 and the number of repeats (but only within the current frame). I've implemented this new encoding for the data files and my main c file. This made the black and white frame opcodes pretty useless, so I just removed them and simplified the if statement.

Here are all of the files I use to test. Included are: the 15 TI arbitrary data files, the main C code, the header for extgraph, and the archive for extgraph. With the code files, you can recreate the project (be sure to set no minimum version, this always gets me). I've also included a savestate for TiEmu with all 15 data files on a fresh install of the OS. If the savestate doesn't work, I would recommend making a savestate after transferring all of the data files over. This way it's easy to test. Smile

Here is the folder containing these files: https://github.com/Carpany/ti89t-video-player

The other encoding optimization I was thinking of was to find the smallest possible region of the frame that changed in comparison to the previous frame, encode 4 bytes describing this region, then only keep track of changes in that region. This way, the only lines that are drawn are the required ones. Despite adding 4 bytes for every frame, it might be worth it for saving power/calls to line drawing. In this case there are about 3500 frames, so ~14000 bytes are added. I was thinking about how i could find this region, and realized that in some cases there should probably be multiple regions, and this is when I (probably wrongfully) decided against implementing it. Might consider this again.

As expected, the code did was broken. I noticed the names in your variables added a leading zero in the file names for the first 10, but I didn't generate the names like this. Other than that, I'm not quite sure what is wrong. But nonetheless good attempt without being able to test. Smile

EDIT: I saw that you said the currentFile = -1 being invalid, but since it is unsigned, doesn't this cause it to overflow, breaking the while loop condition? As I've said earlier, I do not know much about C. Regardless, it seems to work.
Thanks, that will make it possible for me to test my changes, which make ExtGraph superfluous Smile
The minimum OS version can just be set in the code, instead of the silly TPR, by #define MIN_AMS 100 alongside e.g. #define USE_TI89, before #including the environment's headers.

The leading 0x00 for file names is exactly what SYMSTR(...) does internally, but at compile time, instead of dynamically calling strlen() and memcpy(), resulting in much smaller code (and faster, but that's not really relevant):

Code:
#define SYMSTR(s) ({register unsigned short __slen=_rom_call(unsigned long,(const char*),27E)(s);ESI __tempstr=alloca(__slen+2);__tempstr[0]=0;_rom_call(void*,(void*,const void*,long),26A)(__tempstr+1,(s),__slen+1);__tempstr+__slen+1;})


Using two digits for every file's name was for consistency, though it's not a requirement for the 9-byte fixed-size string optimization.

I removed the currentFile = -1 to prevent access to index -1 of the datas and dataLengths arrays, but there will need to be at least one more check to actually do so reliably Smile

EDIT: wow. Mediafire's cookie policy has no easy way to collectively opt out of cookies set by dozens of advertisers, and pressing the download button triggers popups (in 2023 !). Since I don't need either extgraph.* files (I'm upstream for those, and they shouldn't be duplicated everywhere anyway), or a TIEmu save file, you should use a Github repository instead Smile
Thanks. I've edited the message with a GitHub link to the same stuff. Smile
Thanks, Github's much better.

Alright, here's a version that doesn't have multiple obvious crashing bugs during the first frame...
It doesn't work well just yet, as most frames are split over the top and bottom of the screen, with reversed colors, and a shifting split point (which resets upon switch to a new file, probably). Also, when playing enough pieces of the video, the program easily hangs the calculator upon exit, or even triggers Address Error, which are clear signs of memory corruption. However, in that state, it can at least be salvaged, and it's both faster and smaller than your version Smile EDIT: AFAICS, that was due to an unconditional increment of the data pointer at the beginning of the while (dataPtr < dataBlockEndPtr) { loop, now fixed.

In the worst case, I saved dozens of processor clocks per call to SpecialFastDrawLineH_R by removing the reversed arguments check, cutting the address computation, and removing one of the two mode tests and the need to push / pop the mode argument. In the worst case, there are 2 or 3 calls to that function per screen row, i.e. 200-300 calls per frame and therefore 3200-4800 calls per second at 16 FPS, yielding 160K-240K processor clocks @ 50 clocks saved per call (it's clearly significantly more than 25, and significantly less than 100). Sure, that's much less than 1 FPS over the budget of ~12-14 MHz, but it still means the program can save a tiny bit more power in the worst case Smile
With those postincremented moves (I shunted the stupid compiler which produces a slower and almost twice larger code for these five instructions...), the best case of a full black / white frame is almost as fast as it can be.


Code:
#define USE_TI89
#define SAVE_SCREEN
#define MIN_AMS 100
#define OPTIMIZE_ROM_CALLS

#include <tigcclib.h>

#define WIDTH 160
#define HEIGHT 100
#define FPS 16
#define TOTAL_STR 15

// A derivative of FastDrawHLine_R from ExtGraph by TICT.
void SpecialFastDrawHLine_R(void* line asm("a0"), unsigned short x1 asm("d0"), unsigned short x2 asm("d1"), unsigned short mode asm("d3"));
asm ("
| Valid values for mode are: A_REVERSE, A_NORMAL, A_REPLACE, A_OR (A_XOR removed).
|
| This routine draws a horizontal line from (x1) to (x2) on the given line address.

.text
.even
0:
.word 0xFFFF,0x7FFF,0x3FFF,0x1FFF,0x0FFF,0x07FF,0x03FF,0x01FF,0x00FF,0x007F,0x003F,0x001F,0x000F,0x0007,0x0003,0x0001

1:
.word 0x8000,0xC000,0xE000,0xF000,0xF800,0xFC00,0xFE00,0xFF00,0xFF80,0xFFC0,0xFFE0,0xFFF0,0xFFF8,0xFFFC,0xFFFE,0xFFFF

.globl SpecialFastDrawHLine_R
SpecialFastDrawHLine_R:
    move.l   %d4,%a1                         | d4 mustn't be destroyed.

    | Removed: test and fix for reversed x1 and x2.

    | Largely optimized: line address computation.
    move.w   %d0,%d4
    lsr.w    #4,%d4
    adda.w   %d4,%a0
    adda.w   %d4,%a0

| d4 = 8 * (x1/16 + x1/16) + 16. We add 1 before shifting instead of adding 16
| after shifting (gain: 4 clocks and 2 bytes).
    addq.w   #1,%d4                          | d4 = 8 * (x1/16 + x1/16) + 16.
    lsl.w    #4,%d4

    move.w   %d1,%d2                         | x2 is stored in d2.
    andi.w   #0xF,%d0

    add.w    %d0,%d0
    move.w   0b(%pc,%d0.w),%d0 | d0 = mask of first pixels.
    andi.w   #0xF,%d1

    add.w    %d1,%d1
    move.w   1b(%pc,%d1.w),%d1   | d1 = mask of last pixels.
    cmp.w    %d4,%d2                         | All pixels in the same word ?
    blt.s    4f
    sub.w    %d4,%d2                         | d2 = x2 - x.
    moveq.l  #32,%d4
    tst.w    %d3
    beq.s    0f

| A_NORMAL / A_OR / A_REPLACE.
    or.w     %d0,(%a0)+
    moveq    #-1,%d0
    sub.w    %d4,%d2
    blt.s    5f
6:
    move.l   %d0,(%a0)+
    sub.w    %d4,%d2
    bge.s    6b
5:
    cmpi.w   #-16,%d2
    blt.s    7f
    move.w   %d0,(%a0)+
7:
    or.w     %d1,(%a0)
    move.l   %a1,%d4
    rts

| A_REVERSE.
0:
    not.w    %d0
    and.w    %d0,(%a0)+
    moveq    #0,%d0
    sub.w    %d4,%d2
    blt.s    5f
6:
    move.l   %d0,(%a0)+
    sub.w    %d4,%d2
    bge.s    6b
5:
    cmpi.w   #-16,%d2
    blt.s    8f
    move.w   %d0,(%a0)+
8:
    not.w    %d1
    and.w    %d1,(%a0)
    move.l   %a1,%d4
    rts

4:
    and.w    %d0,%d1
    tst.w    %d3
    beq.s    8b
    or.w     %d1,(%a0)
    move.l   %a1,%d4
    rts
");

#define ECHO_PREVENTION_DELAY 150
void WaitForMillis(register unsigned short asm("%d2"));

asm("xdef WaitForMillis\n"
"WaitForMillis:  move.l %d3,-(%sp)\n"
"           moveq  #31,%d1\n"
"           moveq  #31,%d3\n"
"_wl2_:     move.w #132,%d0    /* modify this value for exact timing !!! */\n"
"_wl1_:     rol.l  %d3,%d1\n"
"           dbf    %d0,_wl1_\n"
"           dbf    %d2,_wl2_\n"
"           move.l (%sp)+,%d3\n"
"           rts");


static inline SYM_ENTRY * openBinVAT(const char *symptrName) { // Quicker file reader than default fopen. Thanks Lionel
   return DerefSym(SymFind(symptrName));
}

volatile unsigned short currentFile = 0;
unsigned char * gdataPtr;
unsigned char * gdataBlockEndPtr;
unsigned char * datas[TOTAL_STR];
unsigned short dataLengths[TOTAL_STR];

DEFINE_INT_HANDLER(BreakDrawInt) {
   SetIntVec(AUTO_INT_5, DUMMY_HANDLER);
   currentFile = TOTAL_STR; // Only needed because of while loop.
}

DEFINE_INT_HANDLER(DrawFrameInt) {
   unsigned char x0 = 0, x1 = 0, lastByte = 0; // Initialize vars for writing
   unsigned short currentColor = 0;
   unsigned char * line = LCD_MEM;
   unsigned char *dataPtr = gdataPtr;
   unsigned char *dataBlockEndPtr = gdataBlockEndPtr;

   while (dataPtr < dataBlockEndPtr) { // Draw a frame (essentially copied from for loop, with breaks instead of wait for frame)
      unsigned char nextByte = *dataPtr;
      if (line >= (unsigned char *)LCD_MEM + 30 * HEIGHT) {
         // No need to reinitialize variables.
         break;
      }
      dataPtr++;
      unsigned short newColor = ~currentColor; // Defined here so it doesnt have to be recalculated.
      if (nextByte == 255 || nextByte == 250 || (lastByte < WIDTH && nextByte < lastByte)) { // New horizontal line? Then finish the previous row.
         SpecialFastDrawHLine_R(line, x0, WIDTH - 1, newColor);
         x0 = 0;
         x1 = 0;
         line += 30; // The screen buffer is 240 pixels wide.
      }
      if (nextByte < WIDTH) { // This is the normal line draw. Majority
         x1 = nextByte;
         SpecialFastDrawHLine_R(line, x0, x1, newColor);
         currentColor = newColor;
         x0 = x1;
      } else if (nextByte == 254) { // This is if the frame has not changed. Just wait.
         break;
      } else if (nextByte == 250) { // Next byte will be repeats of 255
         unsigned char repeats = (*dataPtr) - 1;
         unsigned long value = newColor ? 0xFFFFFFFF : 0;
         for (unsigned char i = 0; i < repeats; i++) {
            asm volatile("move.l %1,(%0)+; move.l %1,(%0)+; move.l %1,(%0)+; move.l %1,(%0)+; move.l %1,(%0)+; lea 10(%0),%0" : "=a" (line) : "d" (value) : "cc");
            /**(((unsigned long *)line)++) = value; // The compiler generates stupid code.
            *(((unsigned long *)line)++) = value;
            *(((unsigned long *)line)++) = value;
            *(((unsigned long *)line)++) = value;
            *(((unsigned long *)line)++) = value;
            line += 10;*/
         }

         /*unsigned short repeats = (unsigned short)(*dataPtr) - 2; // minus 1 since the unfinished line is finished above, but dbf also needs -1.
         uint32_t value;
         if (newColor) {
            value = 0xFFFFFFFF;
         }
         else {
            value = 0;
         }
         asm volatile("0: move.l %2,(%0)+; move.l %2,(%0)+; move.l %2,(%0)+; move.l %2,(%0)+; move.l %2,(%0)+; lea 10(%0),%0; dbf %1, 0b" : "=a" (line) : "d" (repeats), "d" (value) : "cc");*/
         dataPtr++;
      }
      lastByte = nextByte;
   }
   short row = _rowread(0x0000);
   if (dataPtr >= dataBlockEndPtr || row == 8) { // TODO portable row reading.
      if (currentFile < sizeof(datas) / sizeof(datas[0])) {
         currentFile++;
      }
      dataPtr = datas[currentFile];
      dataBlockEndPtr = dataPtr + dataLengths[currentFile];
   } else if (row == 2) { // TODO portable row reading.
      if (currentFile > 0) {
         currentFile--;
      }
      dataPtr = datas[currentFile];
      dataBlockEndPtr = dataPtr + dataLengths[currentFile];
   } else if(row == 1) { // pressing enter, pause
      SetIntVec(AUTO_INT_5, DUMMY_HANDLER);
      while(_rowread(0x0000)); // Wait to let go of key.
      WaitForMillis(ECHO_PREVENTION_DELAY);
      while(_rowread(0x0000) != 1); // Wait for second press.
      while(_rowread(0x0000)); // Wait to let go of key.
      WaitForMillis(ECHO_PREVENTION_DELAY);
      SetIntVec(AUTO_INT_5, DrawFrameInt);
   }

   gdataPtr = dataPtr;
   gdataBlockEndPtr = dataBlockEndPtr;
}

void _main(void) {
   static const char VARNAME_STRS[TOTAL_STR][9] = {
      "\0ba\\px01", "\0ba\\px02", "\0ba\\px03", "\0ba\\px04",
      "\0ba\\px05", "\0ba\\px06", "\0ba\\px07", "\0ba\\px08",
      "\0ba\\px09", "\0ba\\px10", "\0ba\\px11", "\0ba\\px12",
      "\0ba\\px13", "\0ba\\px14", "\0ba\\px15",
   };
   SYM_ENTRY *symPtrs[TOTAL_STR];
   short missingfile = FALSE;

   ClrScr(); // Initial setup

   for (unsigned short fileI = 0; fileI < TOTAL_STR; fileI++) { // Set up direct pointers to data while locking memory blocks.
      SYM_ENTRY * symPtr = openBinVAT(VARNAME_STRS[fileI] + sizeof(VARNAME_STRS[0]) - 1); // Define the data using the vat
      if (symPtr == NULL) {
         missingfile = TRUE;
         DrawStr(0, 0, VARNAME_STRS[fileI], A_NORMAL);
         GKeyIn(NULL, 0);
      }
      unsigned char* data = HLock(symPtr->handle);
      data += 2; // Offset from VAT length data

      dataLengths[fileI] = *(unsigned short *)data; // Define the length of the data
      datas[fileI] = data + 2; // Offset from real length data
      symPtrs[fileI] = symPtr;
   }

   if (missingfile) {
      return;
   }

   // Reinitialize global variables.
   gdataPtr = datas[0];
   gdataBlockEndPtr = datas[0] + dataLengths[0];
   currentFile = 0;

   INT_HANDLER oldInt5 = GetIntVec(AUTO_INT_5); // Save default stuff
   INT_HANDLER onInt = GetIntVec(INT_VEC_ON_KEY_PRESS);
   unsigned char oldStart = PRG_getStart();
   //unsigned char oldRate = PRG_getRate();
   //unsigned char oldFont = FontSetSys(F_4x6);

   while(_rowread(0x0000)); // Wait to let go of program run button (enter)

   asm volatile("trap #12; move.w #0x0200,%sr"); // Set defaults to new stuff
   //asm("0: bra.s 0b");
   SetIntVec(AUTO_INT_5, DrawFrameInt);
   SetIntVec(INT_VEC_ON_KEY_PRESS, BreakDrawInt);

   PRG_setStart(196); // If start was 196 for all frames, actual playback speed: 16hz * 1.0069 (found by experiment)
   //PRG_setRate(1); // The OS does not work correctly if the rate is not 1 anyway.

   unsigned char frameCounter = 0;
   while (currentFile < TOTAL_STR) {
      asm volatile("move.b #0b10000,0x600005");
      frameCounter++;
      if (frameCounter == 73) { // Calculated by: (1/(1.0068 - 1)) / 2 = ~73.5 (note: i dont know why divide by 2 shows up)
         PRG_setStart(195);
      } else if (frameCounter == 74) {
         frameCounter = 0;
         PRG_setStart(196);
      }
   }
   WaitForMillis(ECHO_PREVENTION_DELAY);

   asm volatile("trap #12; move.w #0x0000,%sr"); // Restore the program / defaults
   SetIntVec(INT_VEC_ON_KEY_PRESS, onInt);
   SetIntVec(AUTO_INT_5, oldInt5);
   //FontSetSys(oldFont);
   //PRG_setRate(oldRate);
   PRG_setStart(oldStart);

   GKeyFlush();
   OSClearBreak();

   for (unsigned short fileI = 0; fileI < TOTAL_STR; fileI++) { // Unlock memory blocks.
      HeapUnlock(symPtrs[fileI]->handle);
   }
}

Build invocation:

Code:
tigcc -std=gnu99 -Os -Wall -W -Wwrite-strings -fomit-frame-pointer -mregparm=5 -ffunction-sections -fdata-sections -mno-bss -Wa,-l -mpcrel --optimize-code --cut-ranges --remove-sections -save-temps -v main.c -o main

(-save-temps is for not deleting the generated assembly file, so that it can be checked)

Program Statistics:

Code:
  Program Variable Name:                    main\main
  Program Variable Size:                    1623 Bytes
  Absolute Relocs:                          0
  Natively Emitted Relocs:                  0
  Relocs Removed by Branch Optimization:    1
  Space Saved by Range-Cutting:             4 Bytes


NOTE: the code's space efficiency depends on same-length names for the external variables, which is why I kept two digits for variables 1-9, and renamed the files locally.
Appreciate all of your help. Smile Will generate files with padded number for same length efficiency. I'll play around with this more tomorrow.
Thanks again for all your help Smile

Everything is working nicely. I found an okay way to have the video keep up with any framerate, where I find the PRG start values for just above and below the desired framerate. These two framerates balance each other out, and if they are fairly close to the target FPS, it isn't very noticeable:

Code:
   /*
      209 = 21.33 FPS (Every 3 seconds = +4 frames)
      205 = 19.6923077 FPS (Every 13 seconds, -4 frame)
      Target = 20 FPS
      After 3 second, 64 frames will have passed with 209 start, when only 60 shouldve. Play at 19.69 FPS for 13 seconds after to account for these extra 4 frames.
   */
      unsigned short frameCounter = 0;
      PRG_setStart(209);
      while (currentFile < TOTAL_STR) {
         if(frameCounter == 64) {
         PRG_setStart(205);
         } else if(frameCounter == 320) {
         PRG_setStart(209);
         frameCounter = 0;
         }
         asm volatile("move.b #0b10000,0x600005");
         frameCounter++;
      }

The only thing I was unable to do was the compiler flag --remove-sections. It just wouldn't recognize it. I have all the other flags, though. Here is what outputs with --remove-sections:

Code:
cc1.exe: error: unrecognized command line option "-fremove-sections"

Without --remove-sections, everything works.
How are you determining the admissible framerate, BTW ?

--remove-sections is a linker flag, like --optimize-code and --cut-ranges; in the IDE, you need to tick the corresponding box, or equivalently, edit the .tpr file to set a 1 in the appropriate line Smile
Is the correct flag "--remove-unused"? That is the tickbox for Remove Unused Sections, I think.

I am working on a way to calculate the two prg start values, and the frame durations for each, before the program is compiled. For now, the values are just hardcoded, as I was trying to get something that would work in concept. Smile
I finished the code for calculating the two PRG start values. Not exactly satisfied with this solution, but it's not bad. The best solution is using a framerate that is a factor of 1024.

Turns out, with the optimized code, the video has no problem running at 32 FPS. I know this is very overkill (more frames than original 30fps video). However, this solves the syncing problem. Smile Video: https://www.youtube.com/watch?v=WzE-5iha0Gs

Though, one thing I noticed is that i get an Address Error at the end of it if i let it end on its own, however, when I press On to break, this does not happen. This problem does not happen on TiEmu, any ideas on this? It happens at this timestamp: https://youtu.be/WzE-5iha0Gs?t=220
Ah. There must be something else corrupting memory, I had fixed the causes of Address Error and other misbehaviour I saw...
With VTI, sure, but with TIEmu, it's not that easy to get bugs which do not show on the emulator and do on real hardware.
Just swapped the order of the HeapUnlock and the GKeyFlush/OSClearBreak, that seems to have fixed it, but there is still a strange fade out at the end of the video. I also added an OSCheckBreak before the clear break. One of these changes got rid of the Address Error it seems.
The call to OSCheckBreak() before OSClearBreak() really ought to be superfluous.
I may indeed have botched the teardown sequence previously, but I fear that there's still a dragon hiding somewhere...
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 2 of 2
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement