We add load and save capabilities to the game using serialization.
We follow a strategy where we collect data in usual Python structures and ensure it only contains data types we can save in a text file. So, for example, we can save numbers and strings in a text file, not class instances. For structured data, we convert to lists and dictionaries with string keys.
Once we collect such data, we can save it in many text formats, like JSON, YAML, or XML. Furthermore, if we need to save space, we can use corresponding binarized formats or create our own. Then, we restore data by inverting this process: we load data from a file and convert it to Python objects.
This approach is usually called serialization (for saving) and deserialization (for loading). Here we code it with the implementation of a serialization interface.
This post is part of the 2D Strategy Game series
We introduce a new IDataTransfert
interface and implement it in game state classes:
The gatherData()
method returns a dictionary with only serializable data. We use it recursively: for instance, the GameState
class calls the one of the World
class, and so on. As a result, we only need to call the method of GameState
to get all the game state data. Note that the returned data can reference the actual data (not a copy). Consequently, we must not modify it, as it could unexpectedly change the game state data!
The takeData()
method converts serialized data to update the object. We use the verb take
instead of set
or copy
to denote that the method can keep references to serialized data: once we give the serialized data, we must not use it after!
Let's start with the Unit
class and its three attributes. The gatherData()
returns a dictionary where each key corresponds to an attribute:
def gatherData(self) -> Dict[str, Any]:
...serialize...
return {
"playerId": self.__playerId,
"unitClass": unitClassName,
"properties": properties
}
For playerId
, since it is an integer, we have nothing to do.
Enum convertion. For unitClass
, which is a UnitClass
, we must do a conversion. There are two possibilities: convert to a number (UnitClass
is an IntEnum
) or a string. We choose to convert to a string because it is more robust: if we change the correspondences between unit classes and numbers, this serialization will still be correct.
We can use the name
property of the Enum
class to convert instances to names automatically. For example, UnitClass.WORKER.name
evaluates to WORKER
. We can also implement our conversion method, which allows us to choose different conversions and handle future changes in the unit classes. Furthermore, this approach allows us to load old saves even if we rename a unit class.
We add a new toName()
method in the UnitClass
enumeration to convert to a string. It uses a dictionary to perform the conversion (no if...elif
chain!!):
class UnitClass(IntEnum):
...
def toName(self) -> str:
return unitClassId2Name[self]
unitClassId2Name = {
UnitClass.WORKER: "Worker",
...
}
We use this new method in gatherData()
:
unitClassName = self.__unitClass.toName()
Dictionnary conversion. The properties
attribute is a dictionary that maps instances of UnitProperty
to integers. For values, there is no problem, but for keys, we must convert them to strings:
properties = {}
defaultProperties = UnitProperties[self.__unitClass]
for prop, value in self.__properties.items():
if value == defaultProperties[prop]:
continue
properties[prop.toName()] = value
In this code, we duplicate the dictionary, except that we replace enum keys with string keys. The toName()
method converts the property value to a string. In lines 4-5, we ignore properties with a default value to reduce serialized data size.
Deserialization. The takeData()
update the unit attributes with serialized data:
def takeData(self, data: Dict[str, Any]):
self.__playerId = int(data['playerId'])
assert 0 <= self.__playerId < 5, f"Invalid player id {self.__playerId} in unit data"
self.__unitClass = UnitClass.fromName(data['unitClass'])
self.__properties = copy.deepcopy(UnitProperties[self.__unitClass])
for name, value in data['properties'].items():
prop = UnitProperty.fromName(name)
self.__properties[prop] = value
For playerId
, which is an integer, we do an int()
cast even if the serialization should already be an integer (line 2). It handles more cases: for instance, if serialized data is a string that we can convert to an int
, we can still ingest it. Furthermore, it raises an exception if conversion is not feasible: it is much better now than at some random moment during game execution!
For unitClass
, we invert using a new method fromName()
in UnitClass
that converts a string to a UnitClass
instance (line 4).
For properties
, we first copy the default properties given the unit class (line 5). We use a deep copy to ensure no references to the default data: otherwise, all units could share the same properties! Note that, if this specific case, a shallow copy (copy.copy()
) does the same because there are no references in default properties. Anyway, it is a good habit to use a deep copy! We could change the structure properties one day: the code is already ready.
Layers have four attributes: defaultValue
, size
, cells
, and units
. The first one is an integer, so we proceed as before.
Tuple (de)serialization. The size
attribute is a tuple: some formats do not handle it (like JSON). The first solution is to convert it to a list: list(self.__size)
. The second one is to convert to an object, for example:
data = {
"width": self.__size[0],
"height": self.__size[1]
}
The deserialization rebuilds the tuple, and since we know that it is a tuple of integer, we do a cast as we did previously:
self.__size = (int(data["width"]), int(data["height"]))
Dense array (de)serialization. Numpy arrays have a tolist()
method that converts their data into lists. Depending on the shape of the array, it can be a list of lists (2D arrays), a list of lists of lists (3D arrays), etc.
serializedArray = self.__cells.tolist()
Then, we can recreate the array from this serialized data with the array()
function of the Numpy package:
self.__cells = np.array(serializedArray, dtype=np.int32)
We could stop here, but as usual, we want to implement robust solutions to handle future changes. As for unit classes and properties, the number association could change. For instance, in the game's current state, 101 corresponds to a sea tile and 102 to a ground tile. We still want to load old and new saves if these correspondences change.
During the serialization, we collect all values used in the layer:
cellsValues = {
str(self.__defaultValue): CellValue(self.__defaultValue).toName()
}
for value in np.unique(self.__cells):
cellsValues[str(value)] = CellValue(value).toName()
The first value to consider is the default one (lines 1-3): we store the map between the value and its name. Next, we convert keys to strings since some format does not support non-string keys in dictionaries. Then, we use the unique()
function of the Numpy package to add all unique values in the layer (line 4).
The serialized data is then the array converted to lists and this map:
cellsData = {
"data": self.__cells.tolist(),
"values": cellsValues
}
During deserialization, we replace serialized values with the right ones:
self.__cells = np.array(cellsData["data"], dtype=np.int32)
cellsValues = cellsData["values"]
for serialKey, valueName in cellsValues.items():
serialValue = int(serialKey)
value = CellValue.fromName(valueName)
self.__cells[self.__cells == serialValue] = value
The loop in line 3 iterates through all correspondences we stored in the serialized data. In line 4, we convert to an integer because keys in a serialized dictionary are strings. In line 5, we build a CellValue
given a string. At this step, serialValue
is the value in the serialized data, and value
is the corresponding one in our code. For instance, a GROUND_SEA
could have a value of 143 in the serialized data and 102 in our program. Finally, we change values with the previous value with the right one (line 6).
Dense array compression. Layers can be mostly filled with the default: we can reduce the serialized data significantly in these cases, only storing the non-default values.
We first evaluate if it is worth compressing the array values:
nonzeroX, nonzeroY = np.nonzero(self.__cells != self.__defaultValue)
if len(nonzeroX) >= 0.1 * self.__cells.size:
...dense serialization...
else:
...sparse serialization...
The first line computes the coordinates of cells with a value different from the default one. nonzeroX
contains the x coordinates of these cells, and nonzeroY
the y coordinates. Then, if there are at least 10% of non-default values (line 2), we densely store values, as we did previously.
In the other case, we build an array of lists with three values: x coordinate, y coordinate, and the value:
data = []
for x, y in zip(nonzeroX, nonzeroY):
value = self.__cells[x, y]
data.append([int(x), int(y), int(value)])
The zip()
built-in function builds tuples of values from different containers (or iterables). In this example, the first tuple is x=nonzeroX[0], y=nonzero[0]
, the second is x=nonzeroX[1], y=nonzero[1]
, and so on.
We use this packed data for our serialized data, as well as a new "format" key to indicate that it is not dense:
cellsData = {
"format": "sparse",
"data": data,
"values": cellsValues
}
During the deserialization, we branch depending on the value of the "format" key:
if cellsData["format"] == "dense":
...dense deserialization...
elif cellsData["format"] == "sparse":
self.__cells = np.full([self.width + 2, self.height + 2],
self.__defaultValue, dtype=np.int32)
for sparseData in cellsData["data"]:
x = int(sparseData[0])
y = int(sparseData[1])
value = int(sparseData[2])
self.__cells[x, y] = value
For the sparse case, we create an array full of default values (lines 4-5). Then, we iterate through the [x, y, value]
lists of the packed data (lines 6-9). Finally, we use these values to fill the array (line 10)
Units (de)serialization. The case of units is more complex because we have references to objects, which are impossible to store in serialized data. A solution is to collect all the objects (the units in this case) in a dictionary:
unitsObjects = {}
for unit in self.__units.values():
unitsObjects[str(id(unit))] = unit.gatherData()
In line 2, we iterate through all units (the values()
method of dictionaries only returns the values, ignoring the keys). Then, we need to find a technique that produces a unique identifier for each object. There are many possibilities, like using a unique property of objects, a checksum, counters, etc. However, Python already provides an id()
function that does what we want! We use it to get a unique key for each unit (line 3). For units, we call its gatherData()
method to get their serialized data.
We finally serialize the unit
attribute as a sparse array, where we replace each unit with its identifier:
unitsCell = []
for coords, unit in self.__units.items():
unitsCell.append([
coords[0], coords[1], id(unit)
])
For the deserialization, we first build a dictionary that maps identifiers to units (instance of the Unit
class):
unitsData = {}
for unitId, unitData in data["units"]["data"].items():
unit = Unit()
unit.takeData(unitData)
unitsData[int(unitId)] = unit
We create the units
attributes with sparse array deserialization, except that we convert identifiers to units:
self.__units = {}
for unitData in data["units"]["cells"]:
x = int(unitData[0])
y = int(unitData[1])
unitId = int(unitData[2])
self.__units[(x, y)] = unitsData[unitId]
The serializations of the World
and GameState
classes use the same techniques we presented in the previous sections.
Save. We add a save()
method in the GameState
class to save the game state data. It first gets the serialized data thanks to the gatherData()
method:
data = self.gatherData()
Then, we can write this data to a file using the json
package (included in the Python standard library):
import json
with open(fileName, 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=False, indent=4)
We can customize the output format by playing with the json.dump()
arguments (see the doc). In this example, we want UTF-8 data (probably the most portable encoding) with indentation (larger files, but more readable).
We can also use the Pickle format included in the standard library. It is a binary format, so smaller files and faster I/Os, but it is not readable:
import pickle
with open(fileName, 'wb') as bfile:
pickle.dump(data, bfile, protocol=4)
Protocol 4 has been supported since Python 3.4 and has good performance. Pickle is not perfect: it only works with Python (hardly interoperable with over languages) and has security flaws.
Another format is Msgpack: it works for all languages and has no security flaws. However, you need to install a new package and it does not handle as many types as Pickle. Note that, in our current case, it works perfectly with our serialized data:
import msgpack
with open(fileName, 'wb') as bfile:
msgpack.dump(data, bfile)
Load. We add a load()
method in the GameState
class. It first loads the serialized data from a file, for instance, in the JSON case:
import json
with open(fileName, encoding='utf-8') as file:
data = json.load(file)
We could only call the takeData()
method to update the game state. However, an error can occur anywhere and lead to an inconsistent game state. Consequently, we first try to use the serialized data in a temporary game state:
try:
state = GameState()
state.takeData(data)
except Exception as ex:
raise ValueError(f"Error in file {fileName}")
If there is an error, we raise an exception and thus leave the current game state (self
) unchanged.
If there are no errors, we can safely use the data to update the game state:
self.takeData(data)
In the next post, we add cities.