Relational Data Design

Representing complex data in a computational framework

Data layout and structure is a constant trade-off between performance, readability, maintenance, future-proofing, extendability, and reuse. While more complex tables can achieve higher performance and readability, the cost becomes future proofing, maintainability, and often scalability.

Moving towards a document store type of data storage is incredibly beneficial for their simplicity and flexibility. They are more akin to file systems where documents are accessed by name and have few limitations on how they are structured. This is very good for horizontal scalability since it is easy to add more hardware when data does not need to be consistent across multiple tables/machines.

The relational model is a very good fit for developing non-sparse data layouts. However, for widely scaled systems the relational model can be inappropriate. Larger calculations and process now run via map-reduce. The most valuable high level data processing techniques are often a combination of hardware-focused data manipulation layers being used with functional type algorithms.

Data Normalization

Relational models require atomicity when working with data. For our purposes, this means normalizing data to the level of a noun, or a nameable piece. Consider the following setup code:

const mesh_room: Mesh = loadMesh("room_mesh");
const mesh_room_start: Mesh = loadMesh("room_start_mesh");
const mesh_room_trapped: Mesh = loadMesh("room_trapped_mesh");
const mesh_key: Mesh = loadMesh("key_mesh");
const mesh_potion: Mesh = loadMesh("potion_mesh");
const mesh_armor: Mesh = loadMesh("armor_mesh");

const texture_room: Texture = loadTexture("room_texture");
const texture_room_start: Texture = loadTexture("room_start_texture");
const texture_room_trapped: Texture = loadTexture("room_trapped_texture");
const texture_key: Texture = loadTexture("key_texture");
const texture_potion: Texture = loadTexture("potion_texture");
const texture_armor: Texture = loadTexture("armor_texture");

const animation_key_bob: Animation = loadAnimation("key_bob_animation");

const k1: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_copper,
);
const k2: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_silver,
);
const k3: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_gold,
);
const p1: PickupId = createPickup(
  type_potion,
  mesh_potion,
  texture_potion,
  tint_color_green,
);
const p2: PickupId = createPickup(
  type_potion,
  mesh_potion,
  texture_potion,
  tint_color_purple,
);
const a1: PickupId = createPickup(
  type_armor,
  mesh_armor,
  texture_armor,
  null,
);

const r1: Room = createRoom(
  worldPosition(0, 0),
  mesh_room_start,
  texture_room_start,
  null,
);
const r2: Room = createRoom(
  worldPosition(-20, 0),
  mesh_room_trapped,
  texture_room_trapped,
  hpDamage(10),
);
const r3: Room = createRoom(
  worldPosition(-10, 20),
  mesh_room,
  texture_room,
  null,
);
const r4: Room = createRoom(
  worldPosition(-30, 20),
  mesh_room,
  texture_room,
  null,
);
const r5: Room = createRoom(
  worldPosition(20, 10),
  mesh_room_trapped,
  texture_room_trapped,
  hpDamage(10),
);

addDoor(r1, r2, null);
addDoor(r1, r3, k1);
setRoomAsSpecial(r1, starting_room, worldPosition(1, 1));

addPickup(r2, k1, worldPosition(-18, 2));
addDoor(r2, r1, null);
addDoor(r2, r4, k2);

addPickup(r3, k2, worldPosition(-8, 12));
addPickup(r3, p1, worldPosition(-7, 13));
addPickup(r3, a1, worldPosition(-8, 14));
addDoor(r3, r1, null);
addDoor(r3, r2, null);
addDoor(r3, r5, k3);

addPickup(r4, k3, worldPosition(-28, 14));
addPickup(r4, p2, worldPosition(-27, 13));
addDoor(r4, r2);

setRoomAsSpecial(r5, exit_room, null);

This snippet is fairly complicated, involves multiple objects referencing each other, and needs to orchestrated so that all pieces are created before they can be referenced. This results in staggered phases of setup and linking. In order to make this design more database-like, the data needs to be normalized into tables.

Meshes - 0NF

mesh_id mesh_name
mesh_room “room_mesh”
mesh_room_start “room_start_mesh”
mesh_room_trapped “room_trapped_mesh”
mesh_key “key_mesh”
mesh_potion “potion_mesh”
mesh_armor “armor_mesh”

Textures - 0NF

texture_id texture_name
texture_room “room_texture”
texture_room_start “room_start_texture”
texture_room_trapped “room_trapped_texture”
texture_key “key_texture”
texture_potion “potion_texture”
texture_armor “armor_texture”

Animations - 0NF

animation_id animation_name
animation_key_bob “key_bob_animation”

Pickups - 0NF

pickup_id mesh_id texture_id pickup_type color_tint animation
k1 mesh_key texture_key key copper animation_key_bob
k2 mesh_key texture_key key silver animation_key_bob
k3 mesh_key texture_key key gold animation_key_bob
p1 mesh_potion texture_potion potion green  
p2 mesh_potion texture_potion potion purple  
a1 mesh_armor texture_armor armor    

Rooms - 0NF

room_id mesh_id texture_id world_position pickups damage doors_to locked is_start is_end
r1 mesh_room_start texture_room_start 0, 0     r2,r3 r3 with k1 true (1,1) false
r2 mesh_room_trapped texture_room_trapped -20, 10 k1 10hp r1,r4 r4 with k2 false false
r3 mesh_room texture_room -10, 20 k2,p1,a1   r1,r2,r5 r5 with k3 false false
r4 mesh_room texture_room -30, 20 k3,p2   r2   false false
r5 mesh_room_trapped texture_room_trapped 20, 10   25hp     false true

Normalization

There are many stages of normalization, including six normal forms. It is critical to know them and understand why they exist. Opting for a data first approach instead of objects/classes allows for data to be manipulated together. It also allows for changes to the design now require fewer changes to the data.

First Normal Form

Ever cell contains one and only one atomic value. There should exist no array of values, no null entries, and every row should be distinct. All tables should have a unique primary key to assist with sorting and optimizing data access. The key should also be as small as possible and because of the uniqueness rule, every row has an implicit key. This can include using the whole row as a key, but this is to be avoided if possible.

On the topic of treating a row in a database as a key, think of the table as a set and an insert is checking if a combination exists. This can be useful if the number of possible values is small enough to be represented with a bit set. Bit sets take up significantly less memory and can be accessing in $O(1)$ time.

To normalize to first normal form, we need to first remove null values and treat them as new tables.

Pickups - 1NF

pickup_id mesh_id texture_id pickup_type
k1 mesh_key texture_key key
k2 mesh_key texture_key key
k3 mesh_key texture_key key
p1 mesh_potion texture_potion potion
p2 mesh_potion texture_potion potion
a1 mesh_armor texture_armor armor

PickupTints - 1NF

pickup_id color_tint
k1 copper
k2 silver
k3 gold
p1 green
p2 purple

PickupAnimations - 1NF

pickup_id animation
k1 animation_key_bob
k2 animation_key_bob
k3 animation_key_bob

First normal form will create more tables with fewer columns in those tables and will only create rows for things that matter. This means more memory usage, but it also means that we no longer need to check for null values in our code, removing unnecessary state.

Rooms - 1NF

room_id mesh_id texture_id world_position is_start is_end
r1 mesh_room_start texture_room_start 0, 0 true (1,1) false
r2 mesh_room_trapped texture_room_trapped -20, 10 false false
r3 mesh_room texture_room -10, 20 false false
r4 mesh_room texture_room -30, 20 false false
r5 mesh_room_trapped texture_room_trapped 20, 10 false true

PickupInstances - 1NF

room_id pickups
r2 k1
r3 k2
r3 p1
r3 a1
r4 p2
r4 k3

Doors - 1NF

room_id doors_to
r1 r2
r1 r3
r2 r1
r2 r4
r3 r1
r3 r2
r3 r5
r4 r2

LockedDoors - 1NF

room_id to_room locked_with
r1 r3 k1
r2 r4 k2
r3 r5 k3

Traps - 1NF

room_id damage
r2 10hp
r5 25hp

Laying out data like this takes up less space in larger projects as the number of null entries or arrays would have only increased with the increased complexity. Now we can add new features without having to revisit the original objects.

Second Normal Form

Second normal form is about trying to pull out columns that don’t depend on only a part of the primary key. This can happen when the table requires a compound primary key. Consider the following alternative representation of pickups

Pickups - 0NF

mesh_id texture_id pickup_type color_tint
mesh_key texture_key key copper
mesh_key texture_key key silver
mesh_key texture_key key gold
mesh_potion texture_potion potion green
mesh_potion texture_potion potion purple
mesh_armor texture_armor armor  

Pickups - 1NF

mesh_id texture_id pickup_type
mesh_key texture_key key
mesh_potion texture_potion potion
mesh_armor texture_armor armor

TintedPickups - 1NF

pickup_type color_tint
key copper
key silver
key gold
potion green
potion purple

Pickups - 2NF

pickup_id pickup_type
k1 key
k2 key
k3 key
p1 potion
p2 potion
a1 armor

PickupTints - 2NF

pickup_id color_tint
k1 copper
k2 silver
k3 gold
p1 green
p2 purple

PickupAssets - 2NF

mesh_id texture_id pickup_type
mesh_key texture_key key
mesh_potion texture_potion potion
mesh_armor texture_armor armor

PickupAnimations - 2NF

pickup_type animation
key animation_key_bob

Third Normal Form

Third Normal Form removes transitive dependencies on the primary key via another column in the table. For example, any room that has a mesh will also have a matching texture.

Rooms - 1NF

room_id mesh_id texture_id world_position is_start is_end
r1 mesh_room_start texture_room_start 0, 0 true (1,1) false
r2 mesh_room_trapped texture_room_trapped -20, 10 false false
r3 mesh_room texture_room -10, 20 false false
r4 mesh_room texture_room -30, 20 false false
r5 mesh_room_trapped texture_room_trapped 20, 10 false true

Rooms - 3NF

room_id texture_id world_position is_start is_end
r1 texture_room_start 0, 0 true (1,1) false
r2 texture_room_trapped -20, 10 false false
r3 texture_room -10, 20 false false
r4 texture_room -30, 20 false false
r5 texture_room_trapped 20, 10 false true

TexturesAndMeshes - 3NF

texture_id mesh_name texture_name
texture_room_start room_start_mesh room_start_texture
texture_room_trapped room_trapped_mesh room_trapped_texture
texture_room room_mesh room_texture
texture_texture_key key_mesh key_texture
texture_texture_potion potion_mesh potion_texture
texture_texture_armor armor_mesh armor_texture

Boyce-Codd Normal Form

Boyce-Codd Normal Form, BCNF, is normalizing the functional dependencies.

Rooms - BCNF

room_id world_position is_start is_end
r1 0, 0 true false
r2 -20, 10 false false
r3 -10, 20 false false
r4 -30, 20 false false
r5 20, 10 false true

Rooms - BCNF

texture_id is_start is_end
texture_room_start true false
texture_room false false
texture_room_trapped false true

Domain Key Normal Form

Domain key normal form considered to be the last normal forms, but for developing efficient data structures, it should be considered early and often. Domain knowledge is preferable when writing code as it makes immediate sense. Domain knowledge is the idea that data depends on other data, but only given information about the domain in which it resides. It can also be presenting a human interpretation of the data, but this is not always the case. Consider the following transformation:

TexturesAndMeshes - 3NF

texture_id mesh_name texture_name
texture_room_start room_start_mesh room_start_texture
texture_room_trapped room_trapped_mesh room_trapped_texture
texture_room room_mesh room_texture
texture_texture_key key_mesh key_texture
texture_texture_potion potion_mesh potion_texture
texture_texture_armor armor_mesh armor_texture

Assets - DKNF

asset_id stubbed_name
asset_room_start room_start_*
asset_room_trapped room_trapped_*
asset_room room_*
asset_texture_key key_*
asset_texture_potion potion_*
asset_texture_armor armor_*

Operations

Manipulating data will always be handled with either an insert, delete, or update. It’s considered best practice to only modify data using actions that would be commonly found in a database to avoid unexpected state complexity. Even though the data is now laid out like a database, we are not required to use a query language to access it.

Stream Processing

All data can be implemented as streams. Tables can be thought of as sets of all possible permutations of the attributes. Processing a set can be thought of as traversing the set and producing an output set. Given that sets are unordered, this means we can trivially introduce parallel processing.

Stream processing, instead of random access processing, means that we can process data without the need write to variables outside of the process. This means no side-effects and easy parallelization because the order of operations is irrelevant. This makes it easier to think about the system, inspect, debug, extend, and interrupt it.

References