Weights and Biases Sweep with Model Timeouts

3 min readApr 16, 2020

I was recently trying to use the Weights and Biases package to track some model components. They have a great package called Sweeps which makes hyperparameter tuning tracking easy. However, it does not include an option to let your model time out after a certain amount of time. Given that some sets of hyperparameters may result in insanely long runtimes, I decided I needed to create a way to do this. This may not be the best (and certainly isn’t the only) way to make this work, but it gets the job done.

You can find the full code here: https://github.com/kevroy314/wandb-sweeps-timeout

The AsyncModelTimeout Object

To accomplish this, an object called AsyncModelTimeout wraps up a Python multiprocessing.Process object.

from multiprocessing import Process, Manager
import time
import logginglogging.basicConfig(level=logging.DEBUG)# This function wrapped could be used to generate multiple parallel workers
# but we use it to ensure we can get the return metric value
def _func_wrapper(func, procnum, return_dict):
  return_dict[procnum] = func()class AsyncModelTimeout():
  """This class can wrap a model training function and ensure it stops in a specified timeout."""  def __init__(self, target_function, timeout_in_seconds, logging_name=None):
    self.target_function = target_function,
    self.timeout_in_seconds = timeout_in_seconds
    self.logger = logging.getLogger('AsyncModelTimeout' if logging_name is None else logging_name)  def run(self):
    manager = Manager()
    return_dict = manager.dict()
    action_process = Process(target=_func_wrapper, args=(self.target_function[0], 0, return_dict))
    action_process.start()
    action_process.join(timeout=self.timeout_in_seconds)
    if action_process.is_alive():
      action_process.terminate()
      self.logger.warning("Model timed out. Terminated.")
      return False, None # No metric exists, so we return None
    else:
      self.logger.info("Model completed successfully.")
      return True, return_dict[0]

The usage is pretty straightforward:

def do_work():
  for i in range(3): # Run for 3 seconds
    print(i)
    time.sleep(1)
  return "yay"async_model = AsyncModelTimeout(do_work, 1) # 1second timeout
print(async_model.run()) # Will failasync_model = AsyncModelTimeout(do_work, 4) # 4 second timeout
print(async_model.run()) # Will succeed

Running a Weights and Biases Sweep

To run with WandB sweeps, you just wrap up your model (for instance, an example RandomForestClassifier):

# Define how to run the model asynchronouslydef run_model(config, X_train, X_test, y_train, y_test):
  # Just a demo model, but it needs the config
  clf = RandomForestClassifier(random_state=42, **config)  # Time the fit predictions
  t0 = time.time()
  clf.fit(X_train, y_train)
  y_pred = clf.predict(X_test)
  fit_predict_time = time.time() - t0  # Score it
  score = f1_score(y_test, y_pred)
  wandb.log({"fit_predict_time": fit_predict_time}) # Logged to wandb history
  return score # This will be returned out to the wandb summaryrun_model_with_parameters = lambda: run_model(config, X_train, X_test, y_train, y_test)

Then wrap your code in the AsyncModelTimeout.

async_model = AsyncModelTimeout(run_model_with_parameters, 30) # 30 second runtime limit per job
success, score = async_model.run()
wandb.log({"model_completed": success}) # False if the model had to stop early
wandb.log({"f1_score": score})

Critical Notes

One critical issue that anyone doing this should be aware of is that Weights and Biases has a History and a Summary data structure. The History data structure can be populated from within your asynchronous thread (via the wandb.log function as usual). If, however, you try to populate the Summary structure (which is used by Sweeps for hyperparameter tuning) from your other thread, it will not work (as of this version). The workaround is just to make sure you pass your critical Summary measures to the main thread before you’re done.

You can find the full code here: https://github.com/kevroy314/wandb-sweeps-timeout

Weights and Biases Sweep with Model Timeouts

The AsyncModelTimeout Object

Running a Weights and Biases Sweep

Critical Notes

Written by Kevin Horecka