DJ Omnimaga wrote:
Since the last two versions at least, when I run the game a second time it freezes on the first frame of gameplay, forcing me to pull a battery. (fx-CG10)

It seems to be a problem related to serial communication. I think I fixed it.
Could you try this version?: https://www.dropbox.com/s/mo3pbvo4pocrwut/racing.g3a?dl=0
It works now! Smile Something I noticed with the single-player version, by the way, is that when my fx-CG10 is not overclocked the game actually runs slower than its multiplayer counterpart instead of the other way around. Shock

EDIT: It appears it's very sensitive to peripheral clock speed, so I have setup PTunes to use settings where I can use something approaching max peripheral clock speed and now the single-player version is faster again.
It looks like using gint is making it run slower. I don't know exactly why, but I hope Lephe can explain this.

On the fx-CG 50, both versions (with multiplayer disabled) run at about the same speed, despite the fact that one is using DMA and the other is using 32-bit writes.
On the fx-CG 20, performance decreases from 9 to 6 FPS when usinig gint.

The multiplayer makes it slower on the fx-CG 50 because it acts like an FPS limit. But on my fx-CG 20 (and probably on your 10 too), that limit is higher than the FPS, so it doesn't change the performance at all.

I'll have to clarify this on the github repo.
Also, I think it's possible to make multiplayer not affect performance on the fx-CG 50, so I'll try that soon.

Maybe instead of a single-player and multi-player version, I'll have a fx-CG 10/20/50 version and a fx-CG 50 version, which has no borders. (If it's not possible to make the gint version faster than the prizm version)
Wait, the gint version being slower would take the cake. After all the work, how ironic. xD

How do I build with multiplayer disabled so I can test it?

I assume on the fx-CG 10/20 the default setting for Pϕ (which drives the DMA) is less favorable than on the fx-CG 50, compared to the Iϕ/Bϕ fair (which influences normal writes to RAM). I don't know for sure as Idon't have that calculator.
Lephe wrote:
How do I build with multiplayer disabled so I can test it?

Just comment every block with "Serial" in it in src/main.cpp

I don't know if this is what you're talking about but, if I open Pover (an overclocking utility), I see this:
fx-CG 50:
- Freq: 58 MHz
- ICLK: 1/2
- PCLK: 1/4
- BCLK: 1/3
- SCLK: 1/3

fx-CG 20:
- Freq: 58 MHz
- ICLK: 1/3
- PCLK: 1/6
- BCLK: 1/4
- SCLK: 1/4
I'd avoid Pover, as it's outdated by now and I think it caused issues with some early hardware revisions of the fx-CG10/20 like OverClui could (my memory might be playing me tricks, but I seem to recall horror stories about bricked calcs). I think one is based on the other. For the fx-CG10/20 I find that using PTunes2 is safer, since it's more up to date. I believe it's PTunes3 or something like that for the CG50. I only had it crash when I once accidentally overclocked the CPU past the limit.
Hello All,

For your information, these are the default CG50 and CG20 parameters from PTune3 / PTune2

CG50 (PTune3):
PLL x16 --> 232.31MHz
IFC : 1/2
CPU 116.15MHz
SFC : 1/4 roR 8 --> 58.07MHz
BFC : 1/4 CL 2 --> 58.07MHz
PFC : 1/8 --> 29.03MHz


CG20 (PTune2):
PLL x16 --> 235.93MHz
IFC : 1/4
CPU 58.98MHz
SFC : 1/8 roR 8 --> 29.49MHz
BFC : 1/8 raR 2 --> 29.49MHz
PFC : 1/16 raW=R --> 14.75MHz

Cheers

Sly

PS : for your information, we are in the process of fully support OC in gint with Lephe, should arrive soon.
Multiplayer doesn't have any impact on performance anymore, so now the multiplayer version can be as fast as the single-player version. I'll make a new release soon.
I'll keep the gint version because it doesn't have borders, despite it not being faster.

Compared to the prizm version, the gint version is slower on the fx-CG 10/20 and runs at the same speed on the fx-CG 50 (but it should be faster because of DMA).
I still don't know why, but I know that it's not because of using the DMA (I tried not using it).
Other than that, there doesn't seem to be anything relevant to performance that changes, other than gint itself.

slyVTT wrote:
PS : for your information, we are in the process of fully support OC in gint with Lephe, should arrive soon.

By the way, will this be safe? On some fx-CG 50 calculators, PTune3's F5 level doesn't work.
Will gint have some setting that works for every calculator? Or a way to use the maximum possible overclocking without crashing?
duartec wrote:

slyVTT wrote:
PS : for your information, we are in the process of fully support OC in gint with Lephe, should arrive soon.

By the way, will this be safe? On some fx-CG 50 calculators, PTune3's F5 level doesn't work.
Will gint have some setting that works for every calculator? Or a way to use the maximum possible overclocking without crashing?


As of today gint is supporting all PTunes Fx (x=1..5) modes on both fx-CG20 and fx-CG50. I was not aware of such issue with fx-CG50 with F5 (and didn't experienced such issue with mine). We didn't have "user-defined" parameters implemented yet, maybe in the future, but right now we wanted it as easy as possible for programmers. The OC system of gint is robust enough to allow such things on the long run.

OC is currently fully supported for both fx-CG20 and fx-CG50 (this is in <gint/clock.h> for gint >= 2.80.0 see announce here https://www.planet-casio.com/Fr/forums/topic13572-65-gint-un-noyau-pour-developper-des-add-ins.html#187840, sorry this is in french, but no doubt you can understand the code). It will arrive very soon on non_color SH4 and on SH3 fx9860s models, Lephe juste need to do some cleaning on my dirty code Smile

Next big step is to support serial (3pins) stuff to have multiplayer or possibly sound. We are currently working on this.
Sorry for the delay, I've been wanting to investigate why the gint version is slower (can't have that!). I've been unable to build the PrizmSDK version for a while (just crashed), and realized tonight that I'd changed the PrizmSDK linker script to output ELFs a while back for debugging CGDoom. So I've got a build to work (still v1.1) and I'll have a look at performance next. Maybe I can find an optimization or two while I'm at it, though I believe from previous analysis that RAM might be limiting.
I said the gint version ran at the same speed on the fx-CG 50, but now I tested it again, and it actually runs a bit faster, as expected (from about 6.0 to 5.3 ticks per frame).
But it still runs slower on the fx-CG 20 (from about 15.5 to 17.8 ticks per frame)

I wanted to try allocating the depth buffer in the same way as in the prizm version (inside main, and not static). But then I get undefined references to ___cxa_guard_aquire, ___cxa_guard_release, and ___cxa_guard_abort. I also can't do the opposite (make it static on the prizm version): "region 'ram' overflowed".

By the way, I found another problem with gint, but I don't know if it's fixable:
If I exit to the menu (with gint_osmenu), reboot the calculator, and try to open the add-in again, the calculator resets itself.
This also happens with a minimal add-in created by "fxsdk new", on both calculators.
Didn't the fx-CG10/20 have some sort of VRAM and LCD-related bottleneck when it comes to maximum frame rate? If we look at all Casio color games released before the fx-CG50 and Graph 90+E came out, not even one game could break past 21 FPS unless some overclocking was done.
Quote:
I wanted to try allocating the depth buffer in the same way as in the prizm version (inside main, and not static).

Oooh, that is not possible with the gint setup. It only works on the PrizmSDK version because you have that single 512-kB block with the data segment at the top and the stack at the bottom, meaning anything you don't use for data can be used for stack. That is how you can get away with allocating such a huge array on the stack.

I personally felt that this was a huge waste of space because common programming practice keeps the stack small and allocating stuff as globals doesn't allow for memory reuse when objects aren't being used all the time. So I limited the stack size to 16 kB and configured malloc() to use any memory between the end of the data segment and the start of the stack, which is usually > 450 kB. If you tried to allocate the depthBuffer on the stack, you would intersect that arena and break things.

Quote:
But then I get undefined references to ___cxa_guard_aquire, ___cxa_guard_release, and ___cxa_guard_abort.

That is in -lstdc++, though I have absolutely no idea why it'd pop up there.

Quote:
By the way, I found another problem with gint, but I don't know if it's fixable:
If I exit to the menu (with gint_osmenu), reboot the calculator, and try to open the add-in again, the calculator resets itself.
This also happens with a minimal add-in created by "fxsdk new", on both calculators.

I think it's the longest-standing "bug" in the project. Power off is a pretty violent operation; wipes some memory, interacts quite a bit with the add-in. Haven't investigated yet exactly what happens that breaks the add-in, but I'm heavily suspecting that poweroff compatibility means we have to sacrifice on-chip memory and other things, which I'm not quite happy to do. So maybe fixable, but I'm not very hopeful. Just exit the add-in when we leave for the menu... x_x

Quote:
Didn't the fx-CG10/20 have some sort of VRAM and LCD-related bottleneck when it comes to maximum frame rate? If we look at all Casio color games released before the fx-CG50 and Graph 90+E came out, not even one game could break past 21 FPS unless some overclocking was done.

The situation hasn't really changed, the CG-50 is just permanently overclocked by fx-CG 10/20 standards. The bottleneck is still LCD update time and VRAM writing, although stuff is faster by default obviously.

-

So I went around and looked again for possible performance improvements. Your bottleneck is quite obviously dumping tons of data into giant arrays (with a big emphasis on the terrible write performance of the RAM chip).

Most of the things I tried didn't help... I only managed to shave 2 ms off z-buffer clearing by observing how insane it is that you clear 4 bytes × 88700 pixels when in fact only ~3000 pixels have depth information to begin with.

We can improve on that by only writing pixels that have changed. This works because writes are really slow, so we have all the time in the world to read pixels ahead from cache during a write, if they are prefetched. This is my updated Rasterizer::reset():


Code:
      fp v = -1;
      for(int i = 0; i < RENDER_WIDTH*RENDER_HEIGHT; i++) {
         __asm__("pref @%0" :: "r"(&depthBuffer[i+8]));
         if(depthBuffer[i].i != v.i)
            depthBuffer[i] = v;
      }


Normal clear by CPU is 11 ms, with DMA 9 ms, with this 7 ms.

It's not much; tbh your whole situation is quite depressing. You seem to have way too much data killing you. I didn't try anything that would impact the design of the program but I'd recommend trying a 16-bit z-buffer. Best next would be to get rid of the VRAM but that's too big of a change. x_x

Edit: Forgot to mention I don't have an fx-CG 10/20 so I'm unable to investigate the performance difference there, sorry.
Instead of poweroff compatibility, would it be possible to at least exit the add-in without restarting?

Lephe wrote:
I'd recommend trying a 16-bit z-buffer

I'm already using a 8-bit depth buffer, but it looks like you're using an older version.

Could your improvement with prefetching work for drawing to the screen too?
Most of the screen doesn't change much, but maybe it wouldn't work anyway because it doesn't have a cache?

About the fx-CG 20, I made some measurements.
- Clearing the screen on the fx-CG 20 is slower when using DMA (about 1.5x). It's faster on the fx-CG 50 though.
- Clearing the depth buffer is faster when using DMA, but it's still slower than on the prizm version (without DMA).
But I guess you can't fix these issues if you don't have one.
Anyway, I think it's worth sacrificing fx-CG 20 performance to improve it on the fx-CG 50, and get a bigger screen, so I'll probably switch to the gint version once you add serial communication to it.
I have now released version 1.4 with textures and new, rounder models for the sun, the cones, and the car's wheels.

Here's what it looks like:


The texture mapping uses linear interpolation, which isn't perspective-correct, but it's a lot faster, as it doesn't need any divisions per pixel. Because of this, textures wouldn't work well on objects that are both tilted relative to the camera and take up a lot of screen space, unless they're tesselated, so only the car is textured.
However, this is still slow (FPS drops from about 22 to 18), because texturing adds some operations per pixel.
It could be more optimized, but I think it's fast enough.

I don't plan to add any more optimizations or improvements to this game.
About the TI-84 plus CE version, I don't think it would be possible without some big changes because it doesn't seem to have enough memory for the depth buffer.

Also, here are some ideas for the renderer that I won't implement in this game (but maybe in another game if I get an idea):
- Don't use a depth buffer. Instead, use an ordering table to order triangles by their depth and draw them in the correct order. This is mostly implemented in a refactoring branch in the git repo, but the version in that branch is slow because it calls malloc hundreds of times. I think this can be fixed by allocating enough memory once and using that buffer to store the triangles in the ordering table.
- Instead of multiplying the texture color by the brightness for every pixel, store many textures for different brightness levels. This might cost too much memory though.
- Instead of checking if the texture coordinates are between 0 and 1 for every pixel, change the values used when interpolating so that they don't leave that range. Or maybe just don't texture the triangle if they would leave the range (if this only happens in triangles that aren't very visible).
- Use gouraud shading for better graphics (calculate lighting per vertex and interpolate vertex colors).
- Add detail to floors or walls efficiently using triangles with different colors.
- Only texture some parts of a model (use triangle colors where possible).
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 3 of 3
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement