Disk State Storage
Context
During the life of a loadout, the application needs to track several states of the disk. This data includes roughly the following fields:
- Game Path - The path of a file, relative to a LocationId
- Hash - The xxHash64 of the file
- Size - The size of the file
- Last Modified - The last modified date of the file
The latter two entries are not often used directly but are used as shortcuts for detecting if a file needs to be rehashed, a potentially expensive operation. At any time the application can know that it needs to rehash a file if the size or last modified date has changed. We include the size in this calculation for use as a failsafe in the case that the user has modified the file after resetting their system clock.
Additionally, when a loadout is applied to the game, at that point we want to track the state of the disk as the "last applied" state, that can be used as part of the 3-way compare sync process. In addition, when we manage a game for the first time, we want to record the state of the disk so that when we create a new loadout we can setup the state of the loadout to match the unmodified state of the game.
Important Discoveries During Design
During the design process for this new disk state a few important discoveries were made:
All the disk states for a given game installation are linear in time
Since only one loadout can be applied at a time, and we try to sync loadouts and game state, we can view the disk state as evolving linearly in time. This means that we can leverage MnemonicDB's linear time model to store pointers to transaction and use those as the markers for the disk state. In other words, we don't need to store the disk state of every application of a loadout, just the TX pointer to the disk state that was synced during the application of the loadout (more about this later)
Disk state is sorted by the path
Since the paths in the disk state are stored in MnemonicDB, and since MnemonicDB sorts all data, when we query disk states we
have an implicit ordering of all the paths. In addition, if we store these paths as (GameId, LocationId, Path)
tuples,
or as (LoadoutId, LocationId, Path)
, we can perform range queries of (GameId, ...)
or (LoadoutId, ...)
to get all the
entities that are relevant to a given game or loadout, sorted by path. This means that we then can perform a 3-way merge
join on the sources of the synchronizer, and remove a lot of the secondary indexing performed in previous implementations.
In addition, if we query an IndexSegment
for a given prefix, a binary search can be used to find specific entires in the
segment, reducing the need to scan the entire segment.
Finally, part of the synchronization process is to group loadout files by their location, and then select a winner from any
conflicting items. In the past this required a Linq GroupBy operation that caused a lot of overhead. With all the paths being
loaded pre-sorted, and in a single IndexSegment
, the group-by can be performed by finding duplicate entries in the segment
and then passing around the group as a start, end
range in the segment. Essentially this means that grouping can be done
on the stack instead of the heap.
Implementation
The implementation of the disk state storage is as follows:
GameInstallMetadata
Each game installation, when detected, causes the creation of a GameInstallMetadata
entity. We attach disk state to this entity, and
this entity also contains pointers for the following:
- LastAppliedLoadout - The last loadout that was applied to the game
- LastAppliedLoadoutTransaction - The transaction that was created when the loadout was applied
- LastScannedDiskStateTransaction - The transaction that was created when the disk state was last synced
- InitialDiskStateTransaction - The transaction that was created when the game was first scanned
- DiskStateEntries (backref) - All the disk state entries that point to this game
DiskStateEntry
Disk state entries are entities that contain the following fields:
- Path - (GameMetadataId, LocationId, RelativePath) the path of the file
- Hash - The xxHash64 of the file
- Size - The size of the file
- LastModified - The last modified date of the file
- GameMetadataId - The game that this disk state entry is associated with
LoadoutItemWithTargetedPath
The LoadoutItemWithTargetPath model is also updated to use the (LoadoutId, LocationId, RelativePath)
tuple for the target path,
so that at any time all the loadout items that reference a given target in a loadout can be queried via a range query.
Code Implementation
Once this structure was created, many more parts of the application could be simplified. For example, the synchronizer
used to index data into several hashmaps, this can be replaced by a single range lookup in MnemonicDB. In addition the Synchronize
method is further simplified to update the disk state as it extracts, deletes, and otherwise updates files. No reason to
re-scan the disk after synchronization, as the disk state is known when the modification to a given file are made.
At any point the disk state can be queried by conn.AsOf(txId)
where txId
is the transaction id of the desired point in time.
Thus, to get the original disk state of a game, we can query from a database conn.AsOf(game.InitialStateTransaction)
. And to
get the previously applied disk state, we can query conn.AsOf(game.LastAppliedLoadoutTransaction)
.
Implementation Status
All the above structure has been implemented, except for the 3-way merge join. This can be done later on in the development process as the current implementation at least stores the disk state in the correct format. Likewise, the stack-based grouping operation is not implemented.