Switching between Models
Systematic reviews are a cornerstone of evidence-based research, providing comprehensive summaries of scientific literature. However, the manual screening of thousands of records to find a handful of relevant studies is a labor-intensive process, often dealing with highly imbalanced datasets where relevant papers are sparse (<5%). Active learning offers a promising solution by training machine learning models to iteratively prioritize the most relevant records for screening.
In his research, Jelle Teijema explored how switching between classification models during the active learning process can further enhance the efficiency of systematic reviews. Since shallow models (e.g., Na茂ve Bayes, Logistic Regression) perform well when limited labeled data is available, and deeper models (e.g., Convolutional Neural Networks) excel when more training data accumulates, dynamically switching models mid-review could optimize performance across all phases of the review process.
Progress
Jelle鈥檚 project comprised four main studies:
Demonstrating Active Learning Efficiency: Using a dataset of over 46,000 records, he simulated active learning-based reviews and compared them to manual screening. Metrics such as Work Saved over Sampling (WSS) and Average Time to Discovery (ATD) were used to evaluate performance and to show the existence of 鈥渓ast-to-find鈥 relevant papers.
Development of a CNN Model: Jelle designed a custom 17-layer Convolutional Neural Network (CNN) tailored for systematic review tasks. The model was optimized to balance computational efficiency with the ability to uncover complex semantic relations in the data.
Benchmarking Classifiers and Features: Various combinations of classifiers (Na茂ve Bayes, SVM, Logistic Regression, Random Forest, shallow neural networks) and feature extraction methods (TF-IDF, Doc2Vec, SBERT) were compared, with results benchmarked against the CNN model.
Switching Model Simulations: Finally, he examined how performance improved when switching from a shallow classifier to a CNN during the review process, using heuristic rules (e.g., switching after labeling 50 irrelevant records in a row) to determine the optimal switching point.
Jelle鈥檚 findings support the use of model-switching strategies in active learning-based systematic review workflows. He advises starting the review process with a light model, such as Na茂ve Bayes or Logistic Regression, and switching to a heavier classification model like a CNN based on a simple heuristic rule鈥攕uch as after labeling a set number of irrelevant records in a row. This approach ensures efficient discovery of relevant papers throughout all stages of the review.
Funding
This project is funded by a grant from the Center for Urban Mental Health, 木瓜福利影视 of Amsterdam, The Netherlands.
People involved
- Jelle Teijema - Student
- Jonathan de Bruin - Developer
- Rens van de Schoot - Advisor
- Ayoub Bagheri - Advisor