Weights and Biases Sweep with Model Timeouts

Kevin Horecka
3 min readApr 16, 2020

I was recently trying to use the Weights and Biases package to track some model components. They have a great package called Sweeps which makes hyperparameter tuning tracking easy. However, it does not include an option to let your model time out after a certain amount of time. Given that some sets of hyperparameters may result in insanely long runtimes, I decided I needed to create a way to do this. This may not be the best (and certainly isn’t the only) way to make this work, but it gets the job done.

You can find the full code here: https://github.com/kevroy314/wandb-sweeps-timeout

The AsyncModelTimeout Object

To accomplish this, an object called AsyncModelTimeout wraps up a Python multiprocessing.Process object.

from multiprocessing import Process, Manager
import time
import logging
logging.basicConfig(level=logging.DEBUG)# This function wrapped could be used to generate multiple parallel workers
# but we use it to ensure we can get the return metric value
def _func_wrapper(func, procnum, return_dict):
return_dict[procnum] = func()
class AsyncModelTimeout():
"""This class can wrap a model training function and ensure it stops in a specified timeout."""
def __init__(self, target_function, timeout_in_seconds, logging_name=None):
self.target_function = target_function,
self.timeout_in_seconds = timeout_in_seconds
self.logger = logging.getLogger('AsyncModelTimeout' if logging_name is None else logging_name)
def run(self):
manager = Manager()
return_dict = manager.dict()
action_process = Process(target=_func_wrapper, args=(self.target_function[0], 0, return_dict))
action_process.start()
action_process.join(timeout=self.timeout_in_seconds)
if action_process.is_alive():
action_process.terminate()
self.logger.warning("Model timed out. Terminated.")
return False, None # No metric exists, so we return None
else:
self.logger.info("Model completed successfully.")
return True, return_dict[0]

The usage is pretty straightforward:

def do_work():
for i in range(3): # Run for 3 seconds
print(i)
time.sleep(1)
return "yay"
async_model = AsyncModelTimeout(do_work, 1) # 1second timeout
print(async_model.run()) # Will fail
async_model = AsyncModelTimeout(do_work, 4) # 4 second timeout
print(async_model.run()) # Will succeed

Running a Weights and Biases Sweep

To run with WandB sweeps, you just wrap up your model (for instance, an example RandomForestClassifier):

# Define how to run the model asynchronouslydef run_model(config, X_train, X_test, y_train, y_test):
# Just a demo model, but it needs the config
clf = RandomForestClassifier(random_state=42, **config)
# Time the fit predictions
t0 = time.time()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
fit_predict_time = time.time() - t0
# Score it
score = f1_score(y_test, y_pred)
wandb.log({"fit_predict_time": fit_predict_time}) # Logged to wandb history
return score # This will be returned out to the wandb summary
run_model_with_parameters = lambda: run_model(config, X_train, X_test, y_train, y_test)

Then wrap your code in the AsyncModelTimeout.

async_model = AsyncModelTimeout(run_model_with_parameters, 30) # 30 second runtime limit per job
success, score = async_model.run()
wandb.log({"model_completed": success}) # False if the model had to stop early
wandb.log({"f1_score": score})

Critical Notes

One critical issue that anyone doing this should be aware of is that Weights and Biases has a History and a Summary data structure. The History data structure can be populated from within your asynchronous thread (via the wandb.log function as usual). If, however, you try to populate the Summary structure (which is used by Sweeps for hyperparameter tuning) from your other thread, it will not work (as of this version). The workaround is just to make sure you pass your critical Summary measures to the main thread before you’re done.

You can find the full code here: https://github.com/kevroy314/wandb-sweeps-timeout

--

--