mafw.tools.file_tools
The module provides utilities for handling file, filename, hashing and so on.
Functions
|
Generates the hexadecimal digest of a file or a list of files. |
|
Removes widow rows from a database table. |
|
Verifies the goodness of FileChecksumField. |
- mafw.tools.file_tools.file_checksum(filenames: str | Path | Sequence[str | Path], buf_size: int = 65536) str[source]
Generates the hexadecimal digest of a file or a list of files.
The digest is calculated using the sha256 algorithm.
- Parameters:
filenames (str, Path, list) – The filename or the list of filenames for digest calculations.
buf_size (int, Optional) – The buffer size in bytes for reading the input files. Defaults to 64kB.
- Returns:
The hexadecimal digest.
- Return type:
str
- mafw.tools.file_tools.remove_widow_db_rows(models: list[Model | type[Model]] | Model | type[Model]) None[source]
Removes widow rows from a database table.
According to MAFw architecture, the Database is mainly providing I/O support to the various processors.
This means that the processor retrieves a list of items from a database table for processing and subsequently updates a result table with the newly generated outputs.
Very often the input and output data are not stored directly in the database, but rather in files saved on the disc. In this case, the database is just providing a valid path where the input (or output) data can be found.
From this point of view, a widow row is a database entry in which the file referenced by the FilenameField has been deleted. A typical example is the following: the user wants a certain processor to regenerate a given result stored inside an output file. Instead of setting up a complex filter so that the processor receives only this element to process, the user can delete the actual output file and ask the processor to process all new items.
The provided
modelscan be either a list or a single element, representing either an instance of a DB model or a model class. If a model class is provided, then a select over all its entries is performed.The function will look at all fields of
FileNameFieldandFileNameListFieldand check if it corresponds to an existing path or list of paths. If not, then the corresponding row is removed from the DB table.- Parameters:
models (list[Model | type(Model)] | Model | type(Model)) – A list or a single Model instance or Model class for widow rows removal.
- Raises:
TypeError – if
modelsis not of the right type.
- mafw.tools.file_tools.verify_checksum(models: list[Model | type[Model]] | Model | type[Model]) None[source]
Verifies the goodness of FileChecksumField.
If in a model there is a FileChecksumField, this must be connected to a FileNameField or a FileNameListField in the same model. The goal of this function is to recalculate the checksum of the FileNameField / FileNameListField and compare it with the actual stored value. If the newly calculated value differs from the stored one, the corresponding row in the model will be removed, as it is no longer valid.
If a file is missing, then the checksum check is not performed, but the row is removed right away.
This function can be CPU and I/O intensive and last a lot, so use it with care, especially when dealing with long tables and large files.
- Parameters:
models (list[Model | type(Model)] | Model | type(Model)) – A list or a single Model instance or Model class for checksum verification.
- Raises:
TypeError – if
modelsis not of the right type.mafw.mafw_errors.ModelError – if the FileCheckSumField is referring to a FilenameField that does not exist.