sleepless.engine.trainer_torch#

Training script.

Functions

check_exist_logfile(logfile_name, arguments)

Check existance of logfile (trainlog.csv), If the logfile exist the and the epochs number are still 0, The logfile will be replaced.

check_gpu(device)

Check the device type and the availability of GPU.

checkpointer_process(checkpointer, ...)

Process the checkpointer, save the final model and keep track of the best model.

create_logfile_fields(valid_loader, ...)

Creation of the logfile fields that will appear in the logfile.

run(model, data_loader, valid_loader, ...)

Fits a CNN model using supervised learning and save it to disk.

save_model_summary(output_folder, model)

Save a little summary of the model in a txt file.

static_information_to_csv(...)

Save the static information in a csv file.

torch_evaluation(model)

Context manager to turn ON/OFF model evaluation.

train_epoch(loader, model, optimizer, ...)

Trains the model for a single epoch (through all batches)

train_torch(model, training_set, ...)

Fits a CNN model using supervised learning and save it to disk.

validate_epoch(loader, model, device, ...)

Processes input samples and returns loss (scalar)

write_log_info(epoch, current_time, ...)

Write log info in trainlog.csv.

sleepless.engine.trainer_torch.torch_evaluation(model)[source]#

Context manager to turn ON/OFF model evaluation. This context manager will turn evaluation mode ON on entry and turn it OFF when exiting the with statement block.

Parameters:

model (Module) – pytorch network

Yields:

model (pytorch network)

sleepless.engine.trainer_torch.check_gpu(device)[source]#

Check the device type and the availability of GPU.

Parameters:

device (device) – device to use

sleepless.engine.trainer_torch.save_model_summary(output_folder, model)[source]#

Save a little summary of the model in a txt file.

Parameters:
  • output_folder – output path

  • model – pytorch network

Return type:

tuple[str, int]

Returns:

r: The model summary in a text format, n: The number of parameters of the model.

sleepless.engine.trainer_torch.static_information_to_csv(static_logfile_name, device, n)[source]#

Save the static information in a csv file.

Parameters:
  • static_logfile_name (str) – The static file name which is a join between the output folder and “constant.csv”

  • device (device) – device to use

  • n (int) – The number of parameters of the model

sleepless.engine.trainer_torch.check_exist_logfile(logfile_name, arguments)[source]#

Check existance of logfile (trainlog.csv), If the logfile exist the and the epochs number are still 0, The logfile will be replaced.

Parameters:
  • logfile_name (str) – The logfile_name which is a join between the output_folder and trainlog.csv

  • arguments (dict) – start and end epochs

sleepless.engine.trainer_torch.create_logfile_fields(valid_loader, extra_valid_loaders, device)[source]#

Creation of the logfile fields that will appear in the logfile.

Parameters:
  • valid_loader (DataLoader) – To be used to validate the model and enable automatic checkpointing. If set to None, then do not validate it.

  • extra_valid_loaders (list[DataLoader]) – To be used to validate the model, however does not affect automatic checkpointing. If set to None, or empty, then does not log anything else. Otherwise, an extra column with the loss of every dataset in this list is kept on the final training log.

  • device (device) – device to use

Return type:

tuple

Returns:

The fields that will appear in trainlog.csv

sleepless.engine.trainer_torch.train_epoch(loader, model, optimizer, device, criterion, batch_chunk_count)[source]#

Trains the model for a single epoch (through all batches)

Parameters:
  • loadertorch.utils.data.DataLoader To be used to train the model

  • model – pytorch network

  • optimizer – pytorch optimizer

  • device – device to use

  • criterion – pytorch loss function

  • batch_chunk_count – If this number is different than 1, then each batch will be divided in this number of chunks. Gradients will be accumulated to perform each mini-batch. This is particularly interesting when one has limited RAM on the GPU, but would like to keep training with larger batches. One exchanges for longer processing times in this case. To better understand gradient accumulation, read https://stackoverflow.com/questions/62067400/understanding-accumulated-gradients-in-pytorch.

Returns:

A floating-point value corresponding the weighted average of this epoch’s loss

sleepless.engine.trainer_torch.validate_epoch(loader, model, device, criterion, pbar_desc)[source]#

Processes input samples and returns loss (scalar)

Parameters:
  • loader – To be used to validate the model

  • model – pytorch network

  • optimizer – pytorch optimizer

  • device – device to use

  • criterion – loss function

  • pbar_desc – A string for the progress bar descriptor

Returns:

A floating-point value corresponding the weighted average of this epoch’s loss

sleepless.engine.trainer_torch.checkpointer_process(checkpointer, checkpoint_period, valid_loss, lowest_validation_loss, arguments, epoch, max_epoch)[source]#

Process the checkpointer, save the final model and keep track of the best model.

Parameters:
  • checkpointer (Checkpointer) – checkpointer implementation

  • checkpoint_period (int) – save a checkpoint every n epochs. If set to 0 (zero), then do not save intermediary checkpoints

  • valid_loss (float) – Current epoch validation loss

  • lowest_validation_loss (float) – Keeps track of the best (lowest) validation loss

  • arguments (dict) – start and end epochs

  • epoch (int) – current epoch

  • max_epoch (int) – end_epoch

Return type:

float

Returns:

The lowest validation loss currently observed

sleepless.engine.trainer_torch.write_log_info(epoch, current_time, eta_seconds, loss, valid_loss, extra_valid_losses, optimizer, logwriter, logfile, resource_data)[source]#

Write log info in trainlog.csv.

Parameters:
  • epoch (int) – Current epoch

  • current_time (float) – Current training time

  • eta_seconds (float) – estimated time-of-arrival taking into consideration previous epoch performance

  • loss (float) – Current epoch’s training loss

  • valid_loss (Optional[float]) – Current epoch’s validation loss

  • extra_valid_losses (Optional[list[float]]) – Validation losses from other validation datasets being currently tracked

  • optimizer (Optimizer) – pytorch optimizer

  • logwriter (DictWriter) – Dictionary writer that give the ability to write on the trainlog.csv

  • logfile (TextIOWrapper) – text file containing the logd

  • resource_data (tuple) – Monitored resources at the machine (CPU and GPU)

sleepless.engine.trainer_torch.run(model, data_loader, valid_loader, extra_valid_loaders, optimizer, scheduler, criterion, checkpointer, checkpoint_period, device, arguments, output_folder, monitoring_interval, batch_chunk_count, criterion_valid, patience)[source]#

Fits a CNN model using supervised learning and save it to disk. This method supports periodic checkpointing and the output of a CSV-formatted log with the evolution of some figures during training.

Parameters:
  • model – pytorch network

  • data_loader – To be used to train the model

  • valid_loaders – To be used to validate the model and enable automatic checkpointing. If None, then do not validate it.

  • extra_valid_loaders – To be used to validate the model, however does not affect automatic checkpointing. If empty, then does not log anything else. Otherwise, an extra column with the loss of every dataset in this list is kept on the final training log.

  • optimizer – pytorch optimizer

  • scheduler – pytorch scheduler

  • criterion – loss function

  • checkpointer – checkpointer implementation

  • checkpoint_period – save a checkpoint every n epochs. If set to 0 (zero), then do not save intermediary checkpoints

  • device – device to use

  • arguments – start and end epochs

  • output_folder – output path

  • monitoring_interval – interval, in seconds (or fractions), through which we should monitor resources during training.

  • batch_chunk_count – If this number is different than 1, then each batch will be divided in this number of chunks. Gradients will be accumulated to perform each mini-batch. This is particularly interesting when one has limited RAM on the GPU, but would like to keep training with larger batches. One exchanges for longer processing times in this case.

  • criterion_valid – specific loss function for the validation set

sleepless.engine.trainer_torch.train_torch(model, training_set, validation_set, output_folder, model_parameters)[source]#

Fits a CNN model using supervised learning and save it to disk. This method supports periodic checkpointing and the output of a CSV-formatted log with the evolution of some figures during training.

Parameters:
  • model (Module) – pytorch network

  • data_loader – To be used to train the model

  • valid_loaders – To be used to validate the model and enable automatic checkpointing. If None, then do not validate it.

  • output_folder (str) – path to save the model and parameters

  • model_parameters (Mapping) –

    a dictionary where the following keys need to be defined, optimizer: torch.optim.Optimizer epochs: int batch_size: int valid_batch_size: int batch_chunk_count: int drop_incomplete_batch: bool criterion: pytorch loss function scheduler: torch.optim checkpoint_period: int device: str seed: int parallel: int monitoring_interval: int | float

    and optionally:

    criterion_valid: pytorch loss function patience: pytorch loss function