Universal Medical Image Encoder (UMIE) is an open source pretrained model dedicated to diagnostic imaging. UMIE was pretrained on the largest in the world dataset of radiological imaging UMIE datasets. UMIE datasets contains more than 1 million CT, X-ray and MRI images. The data comes from 20 open-source datasets. We unified the labels and masks to follow RadLex ontology by the Radiological Society of North America (RSNA). We also release pipelines for unifying the datasets to a common format. The pipelines can be composed of reusable steps, so anyone can easily expand the dataset with new data. The steps should cover all possible formatting, e.g. extracting masks from XML and converting DICOMs to PNG. We simplify the process to a drag-and-drop.
UMIE model will be made publicly available later this year.
Learn more about the dataset: Medium article.
Repo with data preprocessing pipelines: GitHub.
Comments