ReC98

70 min read Original article ↗
📝 Posted:

🚚 Summary of:
P0330 TH03 RE (Enemies / fireballs / explosions, part 1: Preparations + enemy state)
P0331 TH03 decompilation (Enemies / fireballs / explosions, part 2: Fireball state / enemy functions)
P0332 TH03 decompilation (Enemies / fireballs / explosions, part 3: Enemy functions) + Tooling (TH03 ENEDAT.DAT decompiler)
P0333 TH03 decompilation (Enemies / fireballs / explosions, part 4: Fireball functions + Chaining + Boss Attacks/Panics)
P0334 TH03 decompilation (Enemies / fireballs / explosions, part 5: Collision detection + Combos)
P0335 TH03 research (Enemies / fireballs / explosions, part 6: Score reduction and extend glitches) / Twitter→Fediverse migration, part 2 + Website (Multi-row tabs)

💰 Funded by:
[Anonymous], Ember2528, LeyDud
🏷️ Tags:

Most of my blog posts are way too long because they tend to cover multiple and often unrelated topics. I'm well aware that this blog would reach way more people if I just split these posts into multiple smaller and more thematically focused ones. This time, however, there would have been absolutely no way I could have split off anything about TH03's enemy, fireball, explosion, chaining, and combo systems while still leaving you with a coherent understanding. If this were a multi-post series, you'd be clicking back and forth to even just fully understand one of those systems in its entirety.

Really, what is this mess? I've procrastinated RE work on any of these systems ever since 📝 2022, when it became clear just how much enemies, fireballs, and explosions are connected with each other. Because both enemies and fireballs are things that can explode, ZUN decided to use the same 34-byte structure for them and the explosions they turn into, forming a single 64-element array of explodable entities.
This might remind you of 📝 TH04's and 📝 TH05's custom entity structures. Aren't these even used for 6 different things in each of the two games? Well, the difference between those games and TH03 is like night and day:

  • The custom entity structure is only 26 bytes large, and uses much more consistent semantics for most of the fields. Only 3 (TH04) or 4 (TH05) fields are truly generic and are used in completely different ways within the custom subclasses…
  • …if these fields are even used at all. These subclasses are always used in isolation, by a small and self-contained gameplay system. At first, TH03 doesn't seem too bad, because explosions aren't actually much of a standalone system. "Exploding" is simply another state that both enemies and fireballs can be in, and both of them update and render their explosions independently from each other, despite using the same sprites. Collision detection, however, is shared, and suddenly introduces a shared "base class" that reaches way further than it has any right to. (Also, enemies are by no means a small system.)
  • But then, all of this also interacts with TH03's chaining system. Which is even less well-defined and isolated, and only exists as ad-hoc code within the collision detection of all three systems. Hence, I couldn't RE the chaining system in isolation because there was no way of deducing the intent behind these very specific calculations without knowing the conditions in the enemy/fireball/explosion mess that triggered them. :godzun:

This was very much not code written to be read and understood. This was code that naturally evolved from an explorative process of playing around with the data, with the goal of creating an exciting game whose exact mechanics are hard to figure out from the outside.
But the result is a ≥2,200-line mess of tangled spaghetti that not only accesses the ostensibly same structure in subtly different ways, but also splashes an unhealthy dose of mutable global state on top, adding an extra layer of intractability. It's so convoluted that I would have loved to refactor it immediately before continuing with anything else in this project. It certainly makes sense to do so earlier rather than later because TH03 is the single most ideal PC-98 Touhou game for these sorts of sweeping architecture changes. Its highly dynamic gameplay makes it highly likely for small accidental deviations from ZUN's gameplay logic to snowball into wildly different game states, and we can easily observe these by comparing the demo replays against their 📝 original state. Alas, we're still 719 potential memory references away from position independence making that possible.

This has been the single hardest to understand piece of game logic in all of PC-98 Touhou. The RE content of this post alone beats 📝 the previous record holder for the post with the largest amount of PC-98 Touhou RE content by 2.38×, and would be the second-heaviest post by HTML size that this project has seen so far, just behind 📝 the giant Shuusou Gyoku waveform BGM project. What else could possibly be left in these games that would be harder than this? A lot of the not yet RE'd code is either an isolated gameplay system or isolated to specific characters or bosses, and 📝 a lot of script code will be highly repetitive… yeah, I'm pretty certain that PC-98 Touhou won't have a single remaining RE task that will take as long to complete as this one.
How do you structure a blog post that covers so many intertwined systems? Let's try building this up in the logical order from most to least intertwined:

  1. An important note about durations in TH03
  2. TH03's chaining system
  3. Difficulty-specific tuning
  4. TH03's combo system
  5. TH03's enemy formation scripts
  6. TH03's enemies themselves
  7. TH03's explosions
  8. TH03's fireballs
  9. TH03's score reduction and extend glitches
  10. Completing the Fediverse migration

Before I can talk about anything else, I need to clarify the most important fact about in-game durations in TH03:

Any gameplay-related frame time or real-time duration in TH03 does not include the first 27 frames of every WARNING!! popup shown throughout a round. These frames freeze gameplay precisely because they run in a separate blocking function that runs its own VSync delay loop.

Obviously, this applies both to this blog and to anything you can read in the code. The gameplay community has been rather confused as far as precise durations are concerned, so I'm going to occasionally link back to this explanation in this post.


With that out of the way, let's start with…

The chaining system

…which orchestrates most of TH03's wildly escalating gameplay. Internally, chaining works like this:

  • Each chain tracks four unsigned 8-bit values:
    • Number of hits. The Hit! counter in the HUD shows the single highest one of these among all active chains. Contrary to what its two digits might suggest, this internal field is actually clamped at 255, not 99; that's how the defeat screen at the end of each round gets to show the correct overall highest hit count using three digits.
    • Charge values for fireballs (charge_fireball) and Extra Attacks (charge_exatt). These cause the game to fire one of each once the charge hits a certain value (see below), after which they get reset to 0. Note how this means that you can't carry over any excess charge across individual fireballs and Extra Attacks, and that you don't get multiple fireballs or Extra Attacks if charge_* has reached a multiple of the required value.
    • A pellet_and_fireball_value that increases as the chain destroys pellets (+1), blue fireballs (+6), and red fireballs (+9). It controls how much charge explosions add to charge_fireball and charge_exatt upon destroying pellets and fireballs (not enemies), as shown in the table below.
    Notably, this list doesn't contain the amount of points that each chain contributes to the total bonus shown in the top-left corner of the playfield during an active combo. More on that in the combo section below.
  • Both these charges and the bonuses described below rely on entities being destroyed by explosions. Therefore, destroying individual enemies with no other entities next to them doesn't impact the chain charges or the combo system in any way.
  • The game can track up to 16 chains per player. On the surface, it looks as if these are organized as a ring buffer and assigned to explosions as such, but there are three cases where the game just adds charge to a different and unrelated chain – and yes, this foreshadows the much nastier details later on in this post. Since new chains also only reset their hit counter and the pellet/fireball value and retain their previous charges, they're best thought of as a regular, mutably written piece of state.
  • Internally, these values are stored as a structure of four 16×2-element arrays, which will also become very relevant later on.
  • New chains are started by killing enemies and creating new explosions with anything other than an existing explosion. The explosion is then linked to that specific chain slot.
  • These explosions will then hit any pellets, enemies, and fireballs within their hitbox. Destroying the latter two with an explosion will then create new explosions that are linked to the same chain slot.

The charge values added for destroying enemies with explosions require further explanation, but we can already cover the pellet_and_fireball_value-dependent values added when destroying those two:

Pellet/fireball value Chain effect
≥ 00 and < 04 charge_fireball +2
≥ 04 and < 10 charge_fireball +5
≥ 10 and < 20 charge_fireball +2, charge_exatt +1
≥ 20 charge_exatt +2

A particularly interesting detail about those: If more than one explosion hits a pellet or fireball within the same frame, these charges are added separately for each colliding explosion, to their respective chains. For red fireballs, TH03 then also adds a final +1 to the charge_exatt value of the chain whose colliding explosion has the highest index within the overall array of explodable entities.

Explaining the combo system would be the logical next step, but that system has a dependency on…

Difficulty-specific tuning

Where we get right to the final variable that would 📝 define TH03's difficulty in a round of netplay:

round_speed

This is a Q4.4 fixed-point value that is limited to a range from 0.0 to 7.9375 inclusive, increments by 0.0625 every 64 frames, and starts out at the following values:

Easy Normal Hard Lunatic 📝 Demo
(Round ID × 1.0) (Round ID × 2.0) (2.0 + (Round ID × 2.0)) 6.0 4.0
With the round ID being 0-based and representing either the current VS Mode round or the number of times a Story Mode stage has been repeated.

This variable is then used to derive a whole variety of speeds and limits:

  • Speed of pellets and fireballs while they transfer to the other playfield: (4.375 + ⌊round_speed/3⌋)
  • Speed of the single pellet fired when killing an enemy with an explosion whose chain has a hit count of ≥3 and ≤(14 - (⌊round_speed/2.0⌋ × 2)), after it transferred to the other player's field: (1.5 + (⌊3 × round_speed/16))
  • Additional fireball speed added on top of the randomized base speed between 1.25 and 3.25: ⌊round_speed/2
  • Required charge_exatt within a chain to fire an Extra Attack after destroying a fireball with an explosion: (7 - ⌊round_speed/2.0⌋)
  • Required charge_exatt within a chain to fire an Extra Attack after destroying a fireball with a regular player shot or Charge Shot, or an enemy with an explosion: (6 - ⌊round_speed/2.0⌋)
  • Required charge_fireball within a chain to fire a fireball after destroying an enemy, fireball, or pellet with an explosion (12 - ⌊round_speed/1.0⌋)
  • Two other things in character-specific Extra Attack code I haven't RE'd so far

Yup – the difficulty setting in the Option menu merely controls where all of these values start out at. After 6,144 frames of gameplay, or 1:49 minutes, even an Easy round will have accelerated to Lunatic levels.

enemy_speed

Whenever the game spawns a new formation, it updates this variable by integer-dividing round_speed by 2.0, resulting in a variable that ranges from 0 to 3 inclusive.
This value is then used to scale several speed-related variables using rather weird formulas. For example, ZUN scales movement durations using a formula of

(⌊duration/2⌋ × 3) - ⌊(enemy_speed × duration)/6

Which looks like you could simplify it down to

(duration × (9 - enemy_speed))/6

But as the ⌊floor signs⌋ indicate, all of these divisions operate, once again, on integers, where additional divisions lead to additionally truncated remainders. In this case, they introduce an error of ±1 compared to the simplified formula at regular intervals. Such an error might look like ZUN tried to correctly round the non-truncated floating-point result, but that result would sometimes be even further away from what you get from the janky double-division formula. Let's take a duration of 25, for example:

enemy_speed Simplified, real Double-divided
037.50036
133.33332
229.16628
325.00024

Hence, this makes no sense on any level. The simplified expressions would take up less space in both the C++ code and the binary and execute faster. :zunpet:
The same happens for the speeds that enemies move at. In the code, ZUN uses a formula of

(enemy_speed × speed)/9⌋ + ⌊(2 × speed)/3

which looks like

(speed × (enemy_speed + 6))/9

but actually isn't, and suffers from the same off-by-one errors at regular intervals.

The final and most complex enemy_speed-dependent formula is used for tuning the angles for wavy or circular motions. This time around, ZUN actually used 16-bit angles, treating each enemy's angle field as effectively a Q8.8 fixed-point value. The movement vectors themselves are still calculated by indexing master.lib's 8-bit sine and cosine lookup tables with the top 8 bits of such an angle, but the extra 8 bits below add extra precision when adding the tuned angle_speed on every frame of such a motion.
This extended value range allowed ZUN to reduce the impact of enemy_speed to, literally, a very small degree:

(512 × angle_speed)/3⌋ + ⌊(enemy_speed × ⌊(512 × angle_speed)/3⌋)/9

This formula would have benefited most from a simplification to

((512 × angle_speed)/3) + ((512 × angle_speed × enemy_speed)/27)

which would have only introduced a maximum discrete error of 1.7 per frame, or an error of 0.0094° when converted to 360° angles. But the more critical issue with this formula lies in the first term, ⌊(512 × angle_speed)/3⌋. Multiplying an 8-bit angle_speed by 512 effectively left-shifts the value by 9 bits, requiring 17 bits to store all possible resulting values. However, ZUN then assigns this temporary result to a signed 16-bit variable, causing an overflow into negative numbers – and thus, an incorrectly reversed enemy rotation – as soon as angle_speed reaches ≥0x40, rather than the ≥0x80 where the sign is actually supposed to flip.
Then again, who would seriously add ±90° to an enemy's movement angle per frame. -0x08 and +0x08 are the highest speeds used in the original scripts, so this counts as neither a quirk nor a landmine in my book.


But now we can look at…

The combo system

As you might have already guessed from the list of per-chain fields, the game indeed only tracks the total combo bonus for each player as a single divided-by-10 16-bit value that every chain with ≥2 hits adds onto. For clarity, I'll only use the displayed values in this post. Once the displayed value reaches 655,350, it therefore becomes impossible to cancel the 📝 80-frame cooldown period by adding more bonus points. This also means that you lose all bonus points you would have earned within these last 80 frames.
The bonus value itself grows as you destroy certain entity types with an explosion:

Entity Bonus
Enemies (160 × hits)
Pellets (( (20 × hits) + (160 × round_speed)) × f)
Blue fireballs1000 + (160 × hits)
Red fireballs4440 + (160 × hits)
This multiplication effectively converts round_speed into an integer value.

With f being:

  • 0.5 if round_speed ≤ 3.0
  • 1.0 if round_speed > 3.0 and < 6.0
  • 2.0 if round_speed ≥ 6.0

The thick border in the table indicates once again that the pellet and fireball bonuses are awarded multiple times if the entity is destroyed by more than one explosion on the same frame, just like the charges of its chain. Here are some messy screenshots of that case happening:

Screenshot of two explosions destroying a fireball on the same frame of TH03 gameplayScreenshot of the immediately following frame, showing how the game added 31,600 bonus points for destroying just this one fireball, which is more than twice as high as you'd expect from the table
That's 2 more hits on the Hit! counter, and an additional 1,600 points of score, and Mima demonstrably is not hitting anything other than this one red fireball here.

The bonus difference between the two frames is 31,600, which is exactly

First hit ((4,440 + (14 × 160) + (160 × 7.125)) × 2) +
Second hit ((4,440 + (15 × 160) + (160 × 7.125)) × 2)

With a round_speed of 7.125, we can further deduce that this screenshot was made about 21 seconds into a round at Lunatic difficulty, which checks out when looking at the score. Conclusion: The best way to make things happen in TH03 is to destroy red fireballs with explosions, and preferably multiple explosions in a single frame.

Boss Attacks and Panics

People seem to have the impression that this system and its bonus point (or Spell Point) requirements are complicated, so let's describe it all in just a few bullet points:

  • The bonus point requirement is shared among both players.
  • This requirement is only used for firing bonus-based Boss Attacks.
  • Every time a player has increased their bonus by destroying something with an explosion and is not already displaying a WARNING!! popup on the other player's field, the game checks whether to fire a Boss Attack or Boss Panic:
    • Boss Attacks are fired once the player's bonus value has reached the point requirement and if they don't already have an active boss on the other player's field. Then, the game also increases the requirement by 51,200, and resets all non-hit fields of the triggering explosion's chain – that is, pellet_and_fireball_value, charge_fireball, and charge_exatt.
    • Boss Panics are additional 📝 Level 3 Gauge Attacks that are fired for players that do already have an active boss on the other player's field, once per combo if its value has reached 300,000 points. For this condition, bosses are already counted as active on the first frame of their entrance animation.
      Just like the Boss Attacks they depend on, the game delays firing a Boss Panic until all required conditions are met. Panics also do not increase the point requirement for regular Boss Attacks.
  • The initial point requirement is 51,200.
  • Manually firing a Boss Attack by charging the gauge up to Level 4 increases the requirement by 102,400.
  • Destroying a boss, timing it out, or starting a new round will reset the requirement back to 51,200.
  • Internally, the requirement is stored in the same divided-by-10 point format inside an unsigned 16-bit variable. Contrary to the bonus itself, however, none of the additions to the requirement are clamped, so it can and will overflow.
  • The term spell points seems to have been made up by Rukaroa in 2005. Not too bad of a term to use outside of the codebase, considering that it's a nice two syllables and doesn't collide with any other gameplay feature of TH03.

Now that we know most of what there is to know about scoring, let's get the whole system going by placing some enemies in more or less chainable positions on the playfield!

Enemy formations

These are loaded from ENEDAT.DAT, which is the only bytecode format in the entirety of TH03. The original game does come with hardcoded limits of up to 16 enemies per formation and up to 24 formations in total, but the actual number of formations is neatly taken from the file itself.
The original ENEDAT.DAT defines 18 different formations, which I all recorded and posted to the Fediverse along with screenshots of their reconstructed script code, in their original order within the file. Isn't it nice that my primary PR channel can now hold lossless AV1 videos and lossless images that are just as self-hosted as this blog is, and that I can just link to without them taking up space in a blog post? Since there's nothing beyond the internal IDs in the game's memory that you could possibly want to cross-reference these formations with, I chose to number them using the same 0-based scheme. For the same reason, the names aren't important because nothing in the game needs to name these formations, so I just used the names that the gameplay community came up with, taken from these videos by Christian Azinn and KirbyComment.

Obviously, all these source code images mean that I've also written a dumper for the bytecode format, supporting the 13 functions that ZUN actually used in the original ENEDAT.DAT. Since we 📝 no longer build 32-bit Windows binaries as part of our build process, the ReC98 build process currently exclusively compiles this dumper as a DOS binary to bin\Pipeline\enedat.com. This allowed me to use the game's own header files at the expense of not having a trivially buildable native Windows or Linux version at this point. Such a build wouldn't be all that interesting without a compile option either, I'd say – and that part would definitely fall into the realm of contribution-ideas or require dedicated funding.

Gameplay details

These are quickly summarized:

  • At the beginning of each Story Mode stage and VS round, TH03 reloads these formations into a dynamically allocated buffer. Then, it randomizes them into a single global cycle of 256 formations (while making sure not to show the same formation twice in a row) and 256 random flags that indicate whether the respective formation's positions and velocities should be horizontally mirrored. This means that you'll see this cycle of formations repeating if a round lasts long enough – although, given the average formation length of 3.93 seconds, this would take about 16:45 minutes of gameplay.
  • Once the enemy system reports no living enemies for a player, they are advanced to the next formation on the next frame, where the game then spawns all of the formation's enemies at once. The check treats enemies as no longer alive once they have started playing the 36-frame explosion animation, which means that TH03 will always at least try to render either an enemy or an enemy-originating explosion on every single frame of gameplay. Enemies might get clipped and not actually be blitted, but the game at least tries to. This fact will become very important later on.

As usual for these sorts of script file-formats, I once again wrote a full…

Command reference

…which only requires a few additional bits of context to understand:

  • All parameters are 1 byte large. Data types of common values are as follows:
    • angle*: 📝 8-bit angles
    • clip*: bool
    • duration: Unsigned
    • speed*: Q4.4 fixed-point, unsigned
    • velocity*: Q4.4 fixed-point, signed
  • Most of these functions mutate a few state fields of each enemy, whose last values can therefore be shared across functions:
    • angle: Used for linear, wavy, and circular motions. This field uses the 16-bit angle type described above.
    • speed: The Q4.4 subpixel value that scales every fully recalculated 2D velocity vector calculated from angle.
    • velocity: The Q12.4 delta added to the enemy's X and Y coordinates during all movement functions.
  • Functions are executed on the same frame until the script hits a blocking function with a duration parameter.
  • Unused commands are in gray.
Opcode Function and parameters Description
0x00 Stop Immediately removes the enemy from the playfield and stops script execution.
0x01 Linear move

angle, speed, duration

Sets the enemy's angle and speed fields to the given parameters, recalculates the velocity, and moves the enemy at this velocity for the given duration.
0x02 Circular move

angle_start, speed, angle_speed, duration

Moves the enemy along a circular path. Calculates a new velocity from the given speed and the sine and cosine of angle on every frame. Starts by resetting angle to angle_start, and then adds angle_speed on every frame of the motion.
0x03 Wait

duration

Does nothing for a while. Exclusively used to delay later enemies that follow the same path as earlier ones.
0x04 Wavy X / linear Y move

speed_x, angle_speed, velocity_y, duration

Moves the enemy on a sinusoidal path. Calculates a new velocity by applying a cosine oscillation multiplied by the given speed on the wavy axis while using the given constant velocity on the linear axis. Starts out with angle at 0x00 and adds angle_speed on every frame of the motion.
Most notably, this is the only movement type used in formations #1 (Drunkards) and #12 (Off-centered Crossing).
0x05 Wavy Y / linear X move

speed_y, angle_speed, velocity_x, duration

0x06 Move

duration

Continues moving the enemy at its current velocity.
0x07 Set speed and move

speed, duration

Like 0x01, but only sets the enemy's speed to the given parameter, retaining its current angle for the velocity calculation.
0x08 Linear move, stopping at player Y

angle, speed, duration

Like 0x01, but stops if the enemy's center coordinate intersects with the player sprite on the Y or X axis.
0x09 Linear move, stopping at player X

angle, speed, duration

0x0A Directional circular move

angle_start, speed, angle_speed, velocity_x_plus, velocity_y_plus, duration

Like 0x02, but adds a constant vector on top of the recalculated per-frame velocity.
0x10 Spawn

center_x÷8, center_y÷8, size_words[4], hp[4], clip_x, clip_bottom, unused

Makes enemy appear. The center_x and center_y values are in playfield space; storing them as multiples of 8 is a neat way to cover a range from (⁠-128 × 8⁠) = -1024 to (⁠+127 × 8⁠) = +1016 pixels in a single byte. size_words (the size of the enemy in multiples of 16) and hp are conveniently indexed with enemy_speed (see above).
0x80 Loop (absolute jump)

target, count

Supposed to loop the block between the current instruction pointer (IP) and target (0x80) or (disp + IP) (0x81), but broken due to several bugs in the implementation. These bugs would later be fixed in TH04, where such a loop appears in the very first enemy formation of Stage 1.
0x81 Loop (relative jump)

disp, count

0x82 Set clip_x flag Sets either of the two flags to true. Useful if the enemy was previously spawned with clip_x or clip_bottom set to false: In that case, the enemy first gets to live outside the boundaries of its playfield without being clipped, and this call can then reactivate clipping for proper removal later in the script. This can be seen in formations #16 (Zigzag) and #17 (Flying Junction).
0x83 Set clip_bottom flag

With no way of deduplicating the multiple enemies that fly along the same path in most formations, the scripts in ENEDAT.DAT end up highly copy-pasted, with individual enemies often only differing in the initial delay. It's hard to criticize this though, as this simple non-abstracted approach also provided the flexibility for ZUN to have enemies on multiple paths within the same formation in the first place.

If there is one design flaw in this format, it's the mere existence of the stop instruction. It does fulfill the role of terminating the script interpreter, whose instruction pointer would otherwise continue past the end of an enemy's script, but even that detail wouldn't justify its existence if we consider the design of these enemy formations. Every enemy of every formation is designed to fly past some edge of the playfield, get clipped, despawn, and end script execution that way, yet they still remove themselves using this stop instruction instead of relying on playfield edge clipping to do so.
As usual, I wouldn't mention this if it didn't cause at least one quirk in the game. If we look closely at formation #13 (Folding "7"), we see that the center enemies on the straight vertical path call stop() just a bit too early and despawn themselves just a few pixels above the bottom edge of the playfield:

You could equally argue that this quirk is simply caused by the human error of passing a duration to move_linear() that is shorter than it should have been. Since the playfield is 368 pixels tall, a 64-pixel enemy spawned at a top Y position of (-32 - (64/2)) = -64 and moving at a velocity of 4 pixels per second would need to keep moving for ((64 + 368)/4) = 108 frames rather than just 100. But removing the stop instruction would have made that quirk obvious by causing clearly visible glitches as a result of the script instruction pointer no longer being kept from running past the end of the respective enemy scripts.


Enemies

Despite all the explanations above, there's still a bit left to be said about enemies themselves:

  • The enemy cap is 40 across both playfields. This is only 8 enemies more than the 16×2 enemies that formation #8 (Four Lines of Four), the formation with the highest enemy count, would need when running simultaneously for both players. Due to how TH03 instantly spawns all enemies in a formation and the fact that exploding enemies still take up an enemy slot, it is absolutely possible to hit this cap during regular gameplay and cause interesting formation-related quirks. The simplest test case would be this one:

    • Randomize enemy formations as usual, but start the game with one of the various 12-enemy formations and then continue with formation #8. Let's pick formation #0 (Loop-de-Loop), which has a generous 1-second window where all 12 enemies are visible on the playfield.
    • During this window, both players then need to fire a bomb at roughly the same time, turning all 24 living enemies into explosions.
    • Then, the game will advance both players to formation #8, with P1 taking precedence over P2 if both players bombed on the exact same frame. However, the game will then only have enough free enemy slots to exactly fit the 16 enemies for the one player that spawns them first.

    If both players do bomb on the exact same frame, this happens:

    The spawn call for every single one of P2's enemies ends up doing nothing because all enemy slots are occupied. Consequently, the enemy system continues to report no living enemies for P2, which causes the formation system to advance P2 to the next formation on the following frame. This process repeats for 29 more frames until the first three of both P1's and P2's enemies have finished their explosion animation. Then, P2 finally finds itself with enough free enemy slots to spawn at least 6 of the 8 enemies of formation #9 (Inverse "Y"), its then current 30th formation of the randomized sequence – and a formation sequence cursor that has moved far ahead of P1's.
    This way of seeing into the future of the formation sequence might give P1 an advantage in competitive matches, as they can then prepare for the correct chaining route in advance. Then again, this would still require P2's cooperation – if just one of the players bombs a 12-enemy formation that is followed by the 16-enemy formation #8, the 40-enemy cap will be exactly enough to fit P1's then 28 enemies without further quirks.

  • The difference between ghost, star, heart, and crescent enemies is purely visual, in case anyone was still wondering.
  • The formation scripts might have already hinted at the fact that enemy HP are 0-based. Thus, enemies are only destroyed once they would reach -1 HP, not 0. Each enemy can also only lose up to 1 HP per frame.
  • Killing enemies with an explosion adds the following charge values, depending on the hit count of the explosion's chain, to… not the explosion's chain, but the chain in slot ⌊round_speed/2.0⌋, which may or may not be active? Wait, what?! :zunpet: This is most likely why these charges aren't reset when starting a new chain.
    Hit count of explosion's chain Effect
    ≤2 charge_fireball +1
    ≤(9 - ⌊round_speed/2.0⌋) charge_fireball +2
    ≤(14 - (⌊round_speed/2.0⌋ × 2)) charge_fireball +3, charge_exatt +1
    (greater) charge_fireball +4, charge_exatt +1
    The game only picks the first row whose hit count matches the condition, so these are not cumulative.
    Contrary to pellets and fireballs, these are not multiplied if more than one explosion hits an enemy on the same frame.
  • On the 📝 collision bitmap against players, each enemy is represented by a striped rectangle that's 0.375× the size of the enemy and placed in its center.
  • The hitboxes against player shots are very generous. Perhaps they might look overly so in the figure below, but that's mainly due to TH03 collision-detecting each pair of player shots as a single 32×16 rectangle. The basic hitbox of each enemy is a square in its center that's 0.875× its size, except for 16-pixel enemies, which receive a 20×20 square rather than the 14×14 one they'd otherwise get.
    TH03's 16×16 ghost enemy sprite and its hitbox against player shot pairsTH03's 32×32 ghost enemy sprite and its hitbox against player shot pairsTH03's 48×48 ghost enemy sprite and its hitbox against player shot pairsTH03's 64×64 ghost enemy sprite and its hitbox against player shot pairsTH03's 16×16 star enemy sprite and its hitbox against player shot pairsTH03's 32×32 star enemy sprite and its hitbox against player shot pairsTH03's 48×48 star enemy sprite and its hitbox against player shot pairsTH03's 64×64 star enemy sprite and its hitbox against player shot pairsTH03's 16×16 heart enemy sprite and its hitbox against player shot pairsTH03's 32×32 heart enemy sprite and its hitbox against player shot pairsTH03's 48×48 heart enemy sprite and its hitbox against player shot pairsTH03's 64×64 heart enemy sprite and its hitbox against player shot pairsTH03's 16×16 crescent enemy sprite and its hitbox against player shot pairsTH03's 32×32 crescent enemy sprite and its hitbox against player shot pairsTH03's 48×48 crescent enemy sprite and its hitbox against player shot pairsTH03's 64×64 crescent enemy sprite and its hitbox against player shot pairs
    📝 As 📝 usual, a 32×16 shot pair has to lie fully within the red area for a hit to be registered.
    I'd 📝 usually also like to visualize the exact shape against every type of bullet or shot, but with 4 frames of animation that sometimes change their shape quite heavily × 9 characters, it would be a bit too much – and ultimately not that helpful with such generous hitboxes either.
  • Final fun fact: Kotohime Kotohime's Charge Shot one-hits enemies regardless of HP. This is a deliberate feature that ZUN deliberately activates by setting a dedicated flag.

For reasons that will soon become apparent, I'll skip over fireballs for now. Instead, let's continue with…

Explosions

Collision detection is the only piece of explosion-related code shared between enemies and fireballs, so we can visualize most of the gritty internal details by simply looking at the hitbox frame data… and of course you're getting all 30 permutations of explosion size/source and destroyable entity:

Thanks to the anonymous backer for providing the Anything budget for these multi-row tabs at the end of the 6th push!

We can summarize these 30 videos in just 5 bullet points:

  • Explosions can only ever deal damage once every 4 frames. Against pellets and fireballs, this applies across all 36 frames of an explosion, resulting in 9 possible frames they can be destroyed on.
  • Hits against enemies are further restricted to a window of frames that roughly corresponds to the frames of animation with lots of sparkles in a circle. Since the game flips between these two animation frames on the same 4-frame cycle that the collision detection runs on, the damaging frames are always indicated by a sprite change.
  • For fireball-originating explosions, this window starts 4 frames later than it does for enemy-originating explosions. Whether this was a deliberate design choice or just an unintended inconsistency resulting from outdated copy-pasted code is anybody's guess.
  • This leaves us with only 5 or even just 4 frames where an explosion can possibly damage an enemy: frame 13 (for enemy-originating explosions), 17, 21, 25, and 29.
  • Fireball-originating explosions use the same sprites as explosions that originate from 48×48-pixel enemies, but are rendered 8 pixels lower than their internal position used for collision detection. This is the only fixable bug in all of the systems covered in this post, and it's already fixed in the new Anniversary Edition release.

But there is one more obscure detail that we can't see in these frame data videos. Even within the window where explosions can hit enemies, TH03 further caps the total amount of damage that a single explosion can deal to them. Each enemy-originating explosion can deal up to (size_words × 2) HP of damage in total, while fireball-originating explosions can deal up to 4 HP if the explosion originated from a blue fireball and 6 HP if it originated from a red one. ZUN's code suggests that this detail is among the most accidentally emerged features in the entire game – because of course he did not separately store the damage cap in one of the 14 padding bytes within the explodable structure, but simply reinterpreted size_words (for enemies) or the variant field (for fireballs) as half of the cap's value. :zunpet:
This limitation even seems largely pointless from a game design standpoint at first:

  • The formation scripts always send more than one enemy along the same path.
  • Hence, explosions from earlier enemies will cause subsequent enemies to explode right next to existing enemies.
  • These new explosions will come with a fresh hit counter, which in turn will ensure the destruction of the next enemy on the same path, and so on.

Sure, once we consider the small number of frames where explosions can actually hit enemies, this might make it less likely for older explosions to kill enemies further along the formation's path, but does it really matter in practice?
Turns out that we have to look no further than formation #4 (Racetrack), which sends 12 enemies with increasing HP values along the same path. If we colorize explosions based on the total amount of damage they've dealt, we can clearly see the desync just 48 frames after the first explosion:

I chose black for explosions that currently can't deal damage to enemies. The other 6 colors (dark green, light green, dark red, light red, dark blue, and light blue to represent 0 to 5 dealt hits in this order) are TH03's regular in-game VRAM colors from #5 to #10.

And now imagine a more complex game state where these earlier explosions might in turn cause hit-count-dependent pellets, charged fireballs, or charged Extra Attacks to fire earlier…


Only one left to go then!

Fireballs

As usual, let's start with a few facts in bullet-point form:

  • Fireballs can, indeed, only initially enter the playfield as a result of a chain's fireball charge being sufficiently high.
  • The cap for fireballs is 24, again across both playfields. Since this cap doesn't interact with other gameplay systems, this seems plenty enough.
  • The target center X coordinates for newly spawned fireballs are random, but the possible value range alternates every 256 frames between
    • the initial setting of covering the entire playfield (i.e., [0.0; 288.0[), and
    • being restricted to the 32 pixels to the right of the target player's center X coordinate at the time the fireball was spawned.
    Just like bullets, fireballs also need to 📝 physically travel through the 32-pixel border between the two playfields during their transfer phase.
  • The HP of fireballs behaves similarly to enemies: Fireballs can also only lose up to 1 HP per frame, and are destroyed once they would reach -1 HP, not 0. This is why the code says that blue fireballs have 2 HP and red fireballs have 3 HP, despite them taking 3 and 4 player shots to destroy, respectively.
    The main difference lies in how these HP only matter against regular player shots and most Charge Shots. Both explosions and, again, Kotohime Kotohime's Charge Shot destroy fireballs in a single hit.
  • There is code to move falling fireballs on the X-axis as well, but it won't have any effect in the original game because the X velocity of falling fireballs is always set to 0.
  • Bombing immediately removes all fireballs from the field of the player who fired a bomb.
  • Fireballs only turn into explosions when they are caught in other explosions, whose chains must have ultimately started by destroying an enemy.
  • On the 📝 collision bitmap against players, each falling fireball is represented by a striped 16×16 rectangle in its center.
  • Once again, the hitbox against player shots is roughly where you would expect, and not worth worrying about:
    Hitbox of TH03's fireballs against player shot pairs

The more interesting parts about fireballs all relate to their destruction. Let's start with destruction by explosions, which seems to always send one new red fireball to the other player's field. In reality, these spawns are limited by a "generation number" system: Each blue fireball starts with a generation number of 0, while each red fireball starts, on paper, with 1 plus the generation number of whatever fireball was involved in its creation. In the case of explosions, the game obviously uses the generation number of the exploding fireball itself, but will only spawn such a new fireball if that number is <4.
This might look like even more of a rarely-triggered and expendable gameplay detail than the enemy damage cap we've seen above. How often does the "same" fireball really get transferred between both playfields 4 times in a row, without any player dropping a single generation? Well, exactly this case happens a mere 41.8 real-time seconds into 📝 the very first demo of Reimu vs. Mima. Removing the generation limit would fork gameplay as an explosion would then spawn another 5th-generation fireball that otherwise wouldn't have been there.

Sometimes, however, the game also appears to spawn additional red fireballs in response to destroying only a single one with an explosion. The cleanest example I could find in the four demos happens 1:05 minutes into the Kotohime Kotohime vs. Marisa Marisa demo:

Note how the "primary" new fireball described above is spawned from the center of its explosion, while the "additional" new fireball is spawned from the center of the explosion that caused the old fireball to explode.

With all the features we've previously looked at, it's easy to explain why we get that second fireball – because the chain's fireball charge value was high enough to spawn one. But why is this new fireball red? Doesn't 夢時空.TXT say that red [fireballs] have been sent back at least once?

赤いのは、1回以上送りかえされたことのある奴です。

Yup, 📝 another case where the manual is flat out wrong. The color variant of newly spawned fireballs is read from a global variable that is set to red at the beginning of the fireball collision detection function, set back to blue before returning, and not modified anywhere else. Hence, this color variable does not just apply to the fireballs spawned directly in the collision handler itself, but also to any fireballs spawned indirectly via the chaining system as a result of fireballs colliding with explosions. Given the fact that there's a significant gameplay difference between red fireballs and blue ones, this is quite a significant error in the manual, I'd say… :thonk:

But it gets even quirkier. Which generation number do these chain-charged red fireballs spawn with? Logically, you'd expect 0, but red fireballs are always spawned with a generation number of at least 1. Apart from that, though, there is no other generation number that the code could meaningfully propagate. These fireballs are spawned by the same generic explosion collision handler that would also spawn blue fireballs upon destroying pellets, which only has access to a hitbox and not to the structure instance of the object it tests all explosions against.
But once again, the fireball spawn function reads the previous generation number from a global variable, so it will assign… 1 plus whatever generation number the previous explosion-destroyed fireball had. :tannedcirno:
And yes, the previously destroyed fireball, not the current one, because the collision handler that spawns these chain-charged fireballs is called before the global variable is updated with the affected fireball's generation number.

You can also destroy a fireball without an explosion though. Internally, this is done by removing the old fireball and spawning a new transferring red one at the same position. This new fireball will start with the old fireball's generation number incremented by 1, matching the explanation of the generation system from above. Unlike explosion-destroyed fireballs, there is no limit to the number of times a particular fireball can be transferred in this way. So you can go higher than 5 generations as long as you keep using shots to transfer fireballs back and forth – and if you manage 256 of those transfers, the generation counter will overflow back to 0. :onricdennat:
But then, the game logic goes completely off the rails in the rest of the collision handler. For starters, this kind of fireball destruction still increases the Extra Attack charge of a chain, despite the fact that player shots and Charge Shots exist completely separate from the chaining system and the fact that the respective player may not even have any active explosions on their playfield. ZUN simply takes the chain slot that the next explosion would be assigned to, and adds +1 (for blue fireballs) or +2 (for red fireballs) to its charge_exatt value. Second reason why these charges aren't reset when starting a new chain, I guess…

And then, the code performs another charge_exatt firing check without adding any new charge, in what could have only possibly been a leftover function call that was supposed to go into the branch that handles fireball destruction by explosions. On this non-explosion code path, the chain_slot variable used in this check remains uninitialized, which leads to a potential read and write access outside of the bounds of this array…


…wait a moment, this immediately reminds me of certain bug reports from players that smelled like they were caused by out-of-bounds array accesses in unrelated parts of the game. I've been waiting to find one of these accesses ever since I heard about these bugs, and it's great that I immediately stumbled over the issue on my first RE pass over the fireball collision handler. Could it be that I've just found…

TH03's score reduction and extend glitches

Yup. In their thousands of hours of mastering this game's intricate systems, the gameplay community has encountered two very rare and seemingly random glitches without any obvious cause:

  1. Random score reduction
  2. Two extra Story Mode lives out of nowhere

And indeed, both of them could be caused if charge_exatt were indexed with a chain_slot ≥16, which would cause certain bytes to accidentally get reinterpreted as an Extra Attack charge value, compared against round_speed, and reset to 0 if their value is high enough. Want to take a guess at which particular pieces of data lie particularly close to charge_exatt in TH03's original memory layout? Well…

DS: +00+01+02+03 +04+05+06+07 +08+09+0A+0B +0C+0D+0E+0F
4B3E P1 Extra Attack charge per chain
4B4E P2 Extra Attack charge per chain
4B5E (temporary data) P1 score digits (📝 little-endian BCD) P2 score 
4B6E  digits (📝 little-endian BCD) (temporary data) ☯️
With ☯️ being the number of Story Mode extends gained. These score digits do not include the one's digit, which represents the number of continues used and is stored separately.
The 16 bytes below hold more gameplay-relevant data relating to Yumemi's Charge Shot and Gauge Attack, but researching that was out of scope for this delivery.

This gives us the following chain_slot ranges for the two glitches, depending on which player destroys the fireball:

  1. If chain_slot is between 0x26 and 0x35 inclusive (P1) 0x16 and 0x25 inclusive (P2) , the charge check will hit one of the score digits, causing the score reduction variant of the glitch if the digit's value is high enough.
  2. If chain_slot is exactly 0x3E (P1) 0x2E (P2) , the charge check will hit the byte that controls the number of Story Mode extends gained, causing the extend variant of the glitch if the byte's value is high enough.
    To understand how this can work, we need to look into how ZUN implemented TH03's extend system:

    • At the start of each round, extends_gained is initialized to the value of P1's second-highest score digit, or to 255 if the overall score is ≥20 million.
    • On every frame, the game then compares P1's second-highest score digit against extends_gained. If the digit is higher, it then grants an extend and increments the byte accordingly.
    • At a score of ≥20 million, extends_gained is set to 255, which is higher than any possible single digit and thus blocks the game from granting any further extends. Otherwise, you'd get additional ones at 110 million, 120 million, 210 million, …

    If you edit memory and set extends_gained from any nonzero value to 0, the game will therefore think that you haven't received any extends yet, and newly grant you as many of them as you're supposed to have with P1's current score: one additional extend if the score is ≥10 million, and two if your score is ≥20 million. There are no other checks that prevent the game from granting extends under this condition, so you could repeat this process until you've reached 255 lives, the maximum possible value.
    The "blocking" value of 255 also explains how this glitch can exist in the first place. It can only work if the game sets extends_gained to 0 on its own as a result of the Extra Attack charge check, and the regular values of 0, 1, and 2 are all smaller than the smallest possible Extra Attack charge value of 3. 255, on the other hand, is larger than any possible charge value. Thus, this glitch can only happen if you have ≥20 million points, but then, it will consistently happen whenever its other conditions are met.

But how likely could it possibly be for the chain_slot to fall within these exact ranges? After all, this bug is reported to only happen very rarely.
Turns out that explaining how and why we actually get these effects is not trivial in the slightest. Let's look at the stack layout across the relevant functions during a random frame of gameplay, and try to reason about the value we will actually find in this uninitialized local variable at runtime:

Function BP | SP after prolog Stack
far main()├── (SEG1:MAI) 1000 | 0FFE INST 
near round_main()├┬─ (SEG1:RMN) 0FFA | 0FFA INST →MAI 1000 
far enemies_render()│├┬ (SEG4:ENR) 0FF4 | 0FF0 INST →MAI 1000 SEG1 →RMN 0FFA INST INED 
near enemy_put() / enemy_explosion_put()││└ (SEG4:ENP) 0FEC | 0FE4 INST →MAI 1000 SEG1 →RMN 0FFA INST INED →ENR 0FF4 en_x en_y en_s en_p 
far fireballs_hittest_and_render()│├┬ (SEG4:FHR) 0FF4 | 0FF2 INST →MAI 1000 SEG1 →RMN 0FFA INST INED →ENR 0FF4 en_x en_y en_s en_p
near fireballs_hittest()││├ (SEG4:FHT) 0FEE | 0FEA INST →MAI 1000 SEG1 →RMN 0FFA INST →FHR 0FF4 0FF4 INST en_y en_s en_p

Where:

  • The purple byte in fireballs_hittest() corresponds to chain_slot.
  • The blue and red lines indicate the values of the BP and SP registers.
  • INST and INED are the offsets of the start and end of the "init function" array that holds pointers to functions that should run before main(), annotated using #pragma startup. crt0 happens to load these offsets into the SI and DI registers prior to calling main(). Borland's calling conventions then require the previous value of these registers to be preserved by every function that needs these registers, which is done by pushing them on the stack. And since neither main() nor round_main() use SI or DI, we end up seeing additional stack copies of its original values – i.e., INST and INED – in a few other functions further up on the call stack.
  • Labeled addresses prefixed with correspond to the offset of the following instruction at the current function's call site, where execution will return to.
  • en_* are local variables in the single-enemy rendering function. (Top-left X/Y coordinate, half of its size in screen pixels, and a pointer to the enemy instance.)
  • All values are in hexadecimal, obviously.

That only leaves us with a handful of explicit numbers in this table, all of which are copies of the previous function's base pointer (BP). These get saved onto the stack as part of the typical x86 function prolog, which works like this:

  1. The CALL instruction pushes the caller's code segment (for far calls) and the offset of the next instruction on the call site, before jumping to the address indicated by CALL's operand.
  2. The new function then calls either ENTER <stack size>, 0 or the more RISC-y and consistently faster equivalent sequence of PUSH BP / MOV BP, SP / SUB SP, <stack size>, saving the previous function's base pointer and making room for all required local variables. The key insight here is that SP is simply subtracted. This is exactly where the "garbage values" of uninitialized variables come from, since they will start out with the value of whatever was previously written to their corresponding location on the stack.
  3. If the function needs to modify SI or DI, it also pushes those registers, which further decreases SP.
  4. Upon returning, the function executes the epilog counterpart of these instructions in reverse order:
    1. POP DI and POP SI if needed
    2. LEAVE or the equivalent sequence of MOV SP, BP / POP BP
    3. RETN (near) or RETF (far) to pop the instruction pointer and (for RETF) the caller's code segment

And it's these explicit numbers that reveal that ZUN got incredibly lucky:

  • The initial 0x1000 value for BP we see in main() exactly corresponds to Turbo C++ 4.0J's default stack size of 4,096 bytes.
  • Since ZUN makes very little use of the stack between main() and the two blitting functions, the top 8 bits of BP are guaranteed to be 0F by the time these two functions push that register onto the stack.
  • The address of the local chain_slot variable within fireballs_hittest() will then fall on exactly that stack location where the blitting functions previously wrote the 0F to…
  • … and since chain_slot is an 8-bit integer, it will therefore start out with an "uninitialized" value of 15, which just happens to be the very last valid index into the 16-element Extra Attack charge arrays.

This is also why it matters that the game renders at least one enemy or enemy-originating explosion on every frame of gameplay. The only time it renders neither is during the round start animation, but at that time, there also aren't any fireballs to be destroyed. Therefore, this exact tree of function calls is indeed the only one we need to look at, and proves that the formally undefined initial value of chain_slot is indeed deterministic on the one and only code path where it would be read from.

But from this alone, there should be no chance of the game ever writing outside the bounds of the Extra Attack charge array. So what could we possibly be missing here?
Well, just looking at MAIN.EXE's code flow doesn't actually give us the full picture. This is still a PC-98 game, and needs a full PC-98 system to run in the first place. And once we consider all the subsystems involved in running the game, we notice that three of them need to take control of the x86 instruction pointer at regular intervals:

  • Keyboard input (calling interrupt 0x09)
  • VSync (calling interrupt 0x0A), and
  • PMD's background task (calling interrupt 0x14)

If any of the corresponding IRQs fires, the CPU has to immediately call the corresponding interrupt handler. This requires saving the current CPU flags, code segment, and offset onto the stack, very much like a call to a regular function. While this would only affect the "inactive" area of the stack below SP at the time of the interrupt call, this area exactly controls the "uninitialized" value of future local variables. Thus, interrupts can easily modify these values beyond the state they would have as a result of the normal flow of code.
This theory is supported by the fact that players have reported the extend glitch in particular to occur much more frequently on underclocked PC-98 systems. If each clock cycle takes up a bigger fraction of a second and the three interrupts have to run at regular intervals, we'd get more of them on every frame of gameplay. Thus, slower clock speeds increase the chance of such an interrupt to fall within the exact window of instructions to influence the initial value of the chain_slot variable.
Usually, running PC-98 Touhou over and over in underclocked emulators would make for an awful research and debugging experience. Thankfully, DOSBox-X's Turbo (fast-forward) option counteracts the slowdown in exactly the right way: While the reduced clock speed will stretch each 📝 logical frame to multiple real frames, fast-forwarding the emulation will then display all these frames as fast as possible and without blocking at the emulated VSync signal. The result runs very close to the intended 56.423 FPS, but with way more of those IRQs firing per second.

With this setup, let's see what happens if PMD's timer interrupt happens to fire somewhere near the end of enemies_render:

Function BP | SP after prolog Stack
far main()├── (SEG1:MAI) 1000 | 0FFE INST 
near round_main()├┬─ (SEG1:RMN) 0FFA | 0FFA INST →MAI 1000 
far enemies_render()│├┬ (SEG4:ENR) 0FF4 | 0FF0 INST →MAI 1000 SEG1 →RMN 0FFA INST INED 
near enemy_put() / enemy_explosion_put()││├ (SEG4:ENP) 0FEC | 0FE4 INST →MAI 1000 SEG1 →RMN 0FFA INST INED →ENR 0FF4 en_x en_y en_s en_p 
interrupt opnint()││└ (PMD_:INT) 0FF4 | 0FEA INST →MAI 1000 SEG1 →RMN 0FFA INST INED FLAG SEG4 →ENR en_y en_s en_p
far fireballs_hittest_and_render()│├┬ (SEG4:FHR) 0FF4 | 0FF2 INST →MAI 1000 SEG1 →RMN 0FFA INST INED FLAG SEG4 →ENR en_y en_s en_p
near fireballs_hittest()││├ (SEG4:FHT) 0FEE | 0FEA INST →MAI 1000 SEG1 →RMN 0FFA INST →FHR 0FF4 SEG4 INST en_y en_s en_p
With FLAG being the x86's 16-bit flag register that gets saved to the stack in addition to the current CS and IP registers when calling interrupt functions.

And there we have it. If any of those three interrupts is serviced within the very specific window of x86 instructions between the last rendered enemy and the call to fireballs_hittest(), the call to the interrupt handler will modify the stack in such a way that the future chain_slot variable will start out with the top 8 bits of the address of MAIN.EXE's fourth code segment, instead of its usually guaranteed value of 15. This will be more likely if the last living or exploding enemy was spawned into a slot close to the beginning of the 48-element enemy array, as TH03 will then spend more time looping over the rest of the structure without calling either of the two *_put() functions.

Alright, now we know what the exact variant of the glitch depends on. But how likely is it for MAIN.EXE's fourth code segment to actually fall within the affected ranges of memory, and how can we influence this placement?
First off, since DOS predates ASLR, any specific DOS system will always load its kernel, the shell, and any other drivers at deterministic addresses every time, as long as you don't change any of the parameters that influence this placement. Consequently, the system will also end up with the exact same free regions of memory every time, causing DOS to load a game at the exact same address as well. Of course, since DOS is an open platform where you can get arbitrary code execution by just, uh, writing code, building, and running it, you can easily push the game higher in memory by writing a TSR that reserves the desired amount of memory and executing it before TH03. Pushing TH03 lower, however, would require either

  1. the infamous oldskool wizardry of freeing up as much conventional RAM as possible,
  2. changing the DOS kernel, or
  3. changing the 📝 version or type of the PMD driver loaded from GAME.BAT. (Note that this won't apply to the debloated or Anniversary Editions if you directly launch them via the debloat or anniv binaries because 📝 the integrated TSR spawning code pushes these drivers above the TH03 process).

Option 2) is particularly impactful, especially if we compare earlier DOS kernels with later ones:

  • The first few major versions of MS-DOS load their kernel as a single large contiguous block near the start of conventional RAM, which pushes PMD and the game to the highest segment values I've observed in my tests.
  • With MS-DOS 5, the kernel started pushing more and more of its code into the Upper Memory Area. This causes the MS-DOS 6.20 in a certain widely circulating .HDI of TH04 to load PMD and the game at addresses that will be roughly ~50 KB lower than the ones we see in MS-DOS 3.3.
  • DOSBox-X doesn't need to reserve any conventional memory for a kernel because its entire kernel is implemented in native code. However, it still needs to spend about ~16 KB of low-memory space on run-time data structures for compatibility with programs that directly want to read them, and then chooses to add just as much padding to work around issues in certain games. With COMMAND.COM consistently at segment 0x0800 and the games following immediately after, the resulting segment values strongly resemble those of MS-DOS 6.20.
  • The MS-DOS 7 bundled with Windows 95 features a slightly larger kernel and COMMAND.COM, which again pushes the games slightly higher compared to MS-DOS 6.20.

And this trend matches exactly with the addresses for segment #4 that we can observe when we go out into the wild and compare various distributions of TH03 against each other:

* The Touhou98 Experience v3.00 release is also built around this .HDI.
Created by copying the TH03 files onto that one old widely circulating TH04 .HDI that needed a later DOS version due to 📝 the no-EMS crash bug. Representative of other 5-game .HDI setups that might be floating around the Internet or that people have built for themselves, or real-hardware setups.
†† The bold columns indicate the default setting the package came with.
Game distribution / Kernel PC-9801-26K
PMD.COM
PC-9801-86
PMD86.COM
PC-9801-73
PMDB2.COM
That one old widely circulating .HDI*
MS-DOS 3.3
0x3927 0x3A83 0x3A2B
Custom, MS-DOS 6.20 0x2CCC 0x2E28 0x2DD0
Raw files, current DOSBox-X 0x2C91 0x2DED n/a
2021 Ultimate Collection††
MS-DOS 7
DOSBox-X 0x2C82 n/a 0x2D86
Neko Project 0x2D98 0x2EF4 0x2E9C

Cross-referencing these with the memory map above then gives us the following affected addresses:

I guess we can thank spaztron64 for accidentally building just the right setup to reveal the extend glitch in the first place. I certainly needed to be told of its existence by the gameplay community!
Slot destroyed by P1 destroyed by P2
0x2C P1's 10-million digit temporary data
0x2D P1's 100-million digit temporary data
0x2E P2's 10's digit Extends gained
0x39 temporary data not yet RE'd
0x3A temporary data not yet RE'd

The fix

So, we've got two glitches that may or may not appear on any given PC-98 setup, and whose effects depend on the combination of operating system, sound card model, and machine speed. That should make it abundantly clear that we can classify the underlying code issue as a landmine. There is nothing worth preserving about system-specific behavior on any ReC98 branch other than master, especially since we want to get away from the architecture in the long term.
It's also obvious that we must fix this to retain gameplay integrity in any build that wants to support netplay:

  • Gameplay would desync if the two players run different PC-98 setups with different glitch behavior. If the glitch on a local player's system then spawns an additional Extra Attack onto their simulation of the other player's field, it wouldn't appear in the remote player's own simulation. Thus, they wouldn't be able to see and dodge it. Obviously, we'd like to strive for crossplay between emulated (and even real) PC-98 systems and ports to modern systems, and only cut that feature if there are good arguments against it. But dumb system-specific glitches don't qualify in the slightest.
  • Also, we'd like to skip rendering during rollback. But that would also skip the call to enemies_render(), which is the whole reason why this glitch occurs with this exact frequency.

But how do we fix it? The offending call to chain_fire_charged_exatt() might look redundant at first, but it has definite effects on gameplay on both branches:

  • The explosion branch contains its own copy of the Extra Attack charge check, with a well-defined chain slot but a higher threshold value. A chain might only meet the lower requirement of the usual function.
  • The non-explosion branch uses the probably inactive chain slot at chain_ring_p[pid], which will be different from the uninitialized chain_slot value of 15 in the vast majority of cases. Two calls with two different chain slots might therefore cause a destroyed fireball to spawn two Extra Attacks if both charges are high enough.

The ideal fix is equally simple, though. Just initializing chain_slot to 15 removes all undefined behavior while simultaneously locking down the deterministic and observable effect of normal code flow. We can also easily implement this without breaking the original position-dependent binary by taking the 4 necessary additional bytes from the function's own bloat. And boy is there a lot of it; removing just even the most obvious single piece of bloat in this one function freed up 23 bytes, leaving 19 bytes unused. Any concerns about memory budgets for minor mods are vastly overblown.
This means that we're already up to the second release of TH03's debloated and Anniversary Editions!

Richard Stallman cosplaying as a shrine maiden ReC98 (version P0335) 2026-03-16-ReC98.zip


So that was 3,374 words to explain the ramifications of a single uninitialized local variable, leaving us at 5.375 pushes in total. What better way to round out this one than to finish the unfinished subproject from last time:

📝 As of last time, my Fediverse presence was only missing three pieces of data to function as a full-on replacement for Twitter:

  1. All posts since 2023-07-02, which had twice as much media attached to them as the previous 8½ years combined
  2. Alt text for images
  3. Polls and their results, if this is even possible without assigning each vote to an existing Fediverse user

With the latter two not being part of Twitter's data archive, I already expected that I'd have to cobble together some very awkward code to obtain this data. But I didn't expect that this would cast an even worse light on Twitter:

And I am certainly not paying just to efficiently retrieve data that should have been part of my data archive all along if I can at all avoid it.

Aren't there a few evil third-party projects that could help here? Nitter, for example, is well-known for retrieving Twitter timelines and displaying them on efficient, server-rendered, and easily scrapeable pages, bypassing the need for both activated JavaScript and a Twitter account just for reading posts. Sadly, it didn't render alt text on its frontend at the time in early February when I wanted to complete the Fediverse import. But it's open-source – and although I couldn't comprehend the exact mechanisms of the unofficial Twitter API they use, it might be the best starting point.

And sure enough, someone else previously needed alt text as well, found the string in the API response that Nitter was already processing, and wrote the code to pass it on to the frontend. All I had to do then was to cherry-pick these commits, adjust them for the current API's response schema, and run my own Nitter instance. This was the first time I hacked around in anything written in Nim, but I encountered no issues when building the project on Linux, although the build system is definitely on the slower end as far as systems languages are concerned.
Since this was ridiculous enough for what I wanted to do, I then pushed the updated versions of these commits, in the hope that someone else could save those 10-20 minutes of fixing merge conflicts. Since that issue was languishing for over three years, I certainly didn't expect that Nitter's maintainer would actually merge these commits 1½ weeks later. Even better, though! Thanks to ReC98, alt text is now shown on the main nitter.net instance. As of this blog post, the widget in Nitter's frontend still suffers from a visual overlapping bug in its unexpanded form when displaying multi-line alt text, but having that text in the server-rendered DOM at all is all I was asking for…

…except that my GoToSocial importer can't just automatically scrape this text from nitter.net because they prevent scraping, and have put a lot of effort into keeping it prevented. Specifically, any request made with the Python Requests library will return a 200 OK response code but an empty response body.
The Nitter wiki links a few other public instances, but none of those would work for us either. Some of these are still running a Nitter version older than 1c06a67, as of this blog post. And the ones that do run a current version employ similar scraping prevention techniques: They either respond with a 403 Forbidden, a redirect to a JavaScript challenge page, or just outright close the HTTP connection. Working around that would be way beyond reasonable, considering the budget that this task was supposed to take up… and if they don't want people to scrape them, I should respect that. Yet another example of AI crawlers ruining everything, I guess…
zedeus did recommend Twikit for programmatic access to Twitter data in the issue I linked above, but that library is currently unusable as well. A simple get_tweet_by_id() call returns 403 Forbidden, suggesting that you have to log in first, but any attempt to login() gets blocked by CloudFlare.

So yeah. You might only need to run such an import script once, but if you want a complete migration with alt text and polls, there's no way around temporarily self-hosting a Nitter instance, as far as I can tell. Lovely.
Speaking of polls, though…

Polls

With a scraping setup in place, retrieving poll data and calculating the exact votes from the given percentages on Nitter's frontend was no big deal. Instead, all of the annoyance here lies in then getting that data onto a Fediverse server:

  1. Very understandably, GoToSocial lacks an API to pre-fill poll results,
  2. but it also refuses to create backdated statuses with polls altogether.

These two points defeat any lasting automated migration code I could add to my GoToSocial importer. Printing out the imported data to stdout is the best I can do; any efficient method of importing backdated polls relies on removing the condition from 2) and compiling your own build of GoToSocial.
Thankfully, this is no big deal even when cross-compiling from Windows to Linux. Then, your POST /api/v1/statuses requests can at least include the poll's question labels and return the database ID of the newly created poll for further manipulation at the SQL level. Then, you can reformat the stdout output into a single query and run it manually:

UPDATE polls SET
	voters = 87,
	votes = '[25, 49, 11, 2]',
	expires_at = '2022-04-16 23:53:20.000000+00:00'
WHERE id = {poll_id};

But wait, really? You can manipulate a poll's results, both inside GoToSocial and on all other clients, with just a single SQL query? If these results don't need to be associated with specific users or at least their originating servers, doesn't this mean that polls on the Fediverse are inherently untrustworthy?
Of course, it makes little sense to verify poll results before displaying them on a server's own frontend. You can always just place a reverse proxy in front of the server and rig a poll in that way. But the inbox mechanism of ActivityPub already technically breaks anonymity across servers from the point of view of the server admin, so I would have expected both clients and other federated servers to at least run some sort of validation upon receiving a poll. Although that would definitely cause heavy load on servers and clients as these polls receive more and more answers, within a protocol that is already quite chatty:thonk:

In any case, this kind of validation would have to exist on an entirely different layer that has nothing to do with ActivityPub. That protocol – or rather, the underlying ActivityStreams vocabulary – only specifies how Questions and their associated answers are sent between servers. But once these votes are on a server, they only need to be stored in the minimal way expected by the usual client API. Which, as these words suggest, actually isn't even part of ActivityPub: Although that standard does attempt to specify a client API in addition to the server-to-server protocol, its development and adoption have stalled. Instead, most clients in common use simply implement the API that Mastodon once came up with, as a de facto standard. Mastodon's poll feature was quickly implemented in the most basic way, and no one has ever even proposed adding any validation features on top, as far as I can tell. And so, the GET /api/v1/statuses/{id} endpoint of any Mastodon-compatible Fediverse server is simply expected to return

{
	…
	"poll": {
		"options": [
			{ "title": "Option 1", "votes_count": 0 },
			{ "title": "Option 2", "votes_count": 0 },
		]
	}
	…
}

without any reference to where these votes are coming from, expecting the client to fully trust the server as far as the legitimacy of these votes is concerned. Of course, I'm a pretty trustworthy person if I may say so myself, but this detail turns the Fediverse into a pretty bad place to conduct even just semi-serious polls…
This is also why I haven't pointed any of the old links to Twitter polls on this website to my ActivityPub server yet. As long as the @ReC98Project account still exists and you can still view individual tweets without a Twitter account, it remains the authoritative and ultimately more trustworthy home of these poll results.

So in the end, centralized systems, and Twitter in particular, would still be indispensable for at least the occasional poll I tend to run. Bluesky, unfortunately, still doesn't natively support polls and instead forces people to build custom poll systems in the meantime. While slightly more trustworthy than Fediverse clients on paper, the ones I've seen are just Strawpoll clones that don't come with any kind of authentication, and often even allow people to place multiple votes by simply opening a new private browser window. Unsurprisingly, I therefore received several random and apparenly botted answers within seconds of creating a poll on both the systems I've used so far, making their results even more unusable. One of them also went offline within roughly 6 months of me using it, taking its one associated post with it.
Or maybe I should just conduct polls by asking for thought-out replies and rewarding people with, like, of ReC98 budget going to a goal of their choice…

On the topic of Bluesky, though…

The hell is this video quality? You upload lossless AV1 through the bsky.app frontend, and they blow up the video to 15.5× its original size by applying every single one of the worst possible processing steps:

  1. For starters, they re-encode your video as H.264 using the High profile with YUV420P chroma subsampling. This is the same state of the art from 2005 that Twitter still expects you to produce yourself 21 years later. I'd expect any competitor to be better than that, but with Bluesky, that's just the start.
  2. Because then, they enforce 30 FPS, duplicating or dropping frames as needed. This is inexcusable and disgusting.
  3. They also bilinearly scale your video to 720 pixels in its shortest dimension. At least they maintain its original aspect ratio…
  4. But this scaling process also targets a constant bitrate of 3,000 kbit/s, regardless of whether the video actually needs that much. I'm getting flashbacks of a certain dumb and uninformed if MP3, then 320kbps CBR, everything below sucks mentality.
  5. Those bits would have been much better invested in keyframes. Of which we get exactly zero, causing seeking lags at consistent spots throughout these videos.

But maybe that's just bsky.app's backend doing its own video conversion. Maybe its developers did collect statistics and then made the conscious and informed decision that those exact processing steps would deliver the optimal trade-off between quality, traffic, and compatibility for the kind of HD content that people typically want to upload to such a platform nowadays. Maybe, my use case is just a rare exception… wait, gamers are unhappy too? 🤨
Surely then, the AT Protocol network still stores the original lossless video I uploaded, and an alternate client could play it directly, right? That's certainly what I would expect from a platform that advertises self-hosting, control over one's own data, and account portability.

So let's check Bluesky's data archive, and… the hell is a CAR file? A proprietary binary format in place of the standard .zip file you'd get from even the most evil corporations on the planet? Why the hell would they confront people trying to access their own data with a blog post containing Go code, directly linked from the data export modal? Usually, platforms are hard to leave due to network effects and people's inherent laziness to open another browser tab, but I've never seen a service also put up such a deliberate technical hurdle. How expensive could it have possibly been to set up a server that zips up all of a user's data on request? Why did they have to externalize these costs to their users?
The biggest joke, though: All code in that blog post is taken from this repository with example code, but the README in that directory says:

Check out the goat command line tool, which does the same thing and is actively maintained.

Except that the goat subcommand that actually downloads post-attached blobs, goat blob export, downloads directly from the AT Protocol network and doesn't even need the CAR file! Which, by the way, you could also download via goat repo export. And why wouldn't you, because you sure need this tool anyway to do anything because no existing tooling works with this format!
Anyway. Let's look at the blob of one of those videos, and…

$ ffprobe -hide_banner 03_source.webm
Input #0, matroska,webm, from '03_source.webm':
  Metadata:
    ENCODER         : Lavf62.3.100



  Duration: 00:00:05.12, start: 0.000000, bitrate: 454 kb/s
  Stream #0:0:
    Video:
      av1 (libdav1d) (High), gbrp(pc, gbr/bt709/iec61966-2-1, progressive),
      288x368, SAR 1:1 DAR 18:23, 56.42 fps, 56.42 tbr, 1k tbn
    Metadata:
      ENCODER         : Lavc62.11.100 libaom-av1
      DURATION        : 00:00:05.122000000
$ ffprobe -hide_banner bafkreid7mp5u7mb32o2a5n7zl2hbfxkiiaiebh7x77vik27alu2hfvuq5q
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bafkreid7mp5u7mb32o2a5n7zl2hbfxkiiaiebh7x77vik27alu2hfvuq5q':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf61.1.100
  Duration: 00:00:05.17, start: 0.000000, bitrate: 193 kb/s
  Stream #0:0[0x1](und):
    Video:
      h264 (High) (avc1 / 0x31637661), yuv420p(tv, unknown/bt709/iec61966-2-1, progressive),
      288x368 [SAR 1:1 DAR 18:23], 189 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc61.3.100 libx264

That's basically the same video before the scaling. Still YUV420P, still 30 FPS, still no keyframes.
I hate everything about this – and most importantly, the fact that I even have to bother with this complete waste of human engineering effort just because these kinds of platform choices are decided by politics and marketing for the vast majority of people. If I didn't land the occasional viral hit on there, I would have discontinued Bluesky immediately after this discovery. Maybe I should just trade one corporate platform for another and start looking into Tumblr, as touhou-memories recommended…

Oh well. Next up: Headless Shuusou Gyoku, preparing the automated replay validation I've wanted for years. Since Shuusou Gyoku's new replay system is long overdue now, I will put all future budget from Ember2528 into that system until it's done. If you'd like more TH03 done sooner, please fund it separately – but with of free budget left, there should be plenty of room for everyone.
That said, my life priorities do seem to have shifted for good by now. So I might need to reduce the cap if all of you really end up commissioning that much… But let's see how fast things will actually progress in the future, now that this worst possible piece of reverse-engineering is finally done.