Recent advances in high-performance computing systems for artificial intelligence enable large-scale training of information extraction models from free-form natural language texts. The development of these models is essential to the cancer surveillance research and automation. In this study, we propose an approach to accelerate training of machine learning models by introducing task parallelism. For the information extraction from cancer pathology reports, we implement task parallelism by splitting the task of identifying multiple cancer topography and morphology into several sub-problems. This allows for the hyperparameters for each sub-problem to be optimized in parallel. Further, we introduce the model parameter inheritance to improve the convergence rate of the hyperparameter optimization runs themselves. We evaluate the feasibility of the proposed method on the Summit supercomputer and demonstrate that it improves time-to-solution by a factor of 10, when compared to the traditional model-based optimization algorithm, while maintaining the same level of clinical task performance scores.
Researchers should cite this work as follows: