All Tags
AWS
algorithm-design
architecture
cloud-principles
cost-reduction
data-centric
data-compression
data-processing
deployment
design
edge-computing
energy-footprint
hardware
libraries
locality
machine-learning
management
measured
migration
model-optimization
model-training
performance
queries
rebuilding
scaling
services
strategies
template
workloads
Tactic: Use Checkpoints During Training
Tactic sort:
Awesome Tactic
Type: Architectural Tactic
Category: green-ml-enabled-systems
Title
Use Checkpoints During Training
Description
Training is an energy-intensive stage of the machine learning life cycle, which may take long periods of time. Sometimes a failure or hardware error can terminate the training process before it is completed. In those cases, the training process must be started from the beginning. The use of checkpoints however can save the trained model in regular intervals and in case of a premature termination, the training process can continue at the last checkpoint (Shanbhag et al., 2022). Using checkpoints during training improves the robustness of a ML system.
Participant
Data Scientist
Related software artifact
Memory
Context
Machine Learning
Software feature
< unknown >
Tactic intent
Improve energy efficiency by using checkpoints during training to prevent knowledge loss due to a premature termination, which would in turn require to restart the process from the beginning, therefore increasing energy consumption.
Target quality attribute
Recoverability
Other related quality attributes
Energy Efficiency
Measured impact
< unknown >