When it comes to identifying patterns in mountains of data, human beings cannot equate to artificial intelligence (AI). In particular, a branch of AI called machine learning is often used to find patterns in datasets – be it for stock market analysis, image and speech recognition or cell classification. To reliably distinguish cancer cells from healthy cells, a team led by Dr. Altuna Akalin, head of the Bioinformatics and Omics Data Science Platform at the Max Delbruck Center for Molecular Medicine at the Helmholtz Association (MDC), has now developed a program to machine learning called “Icarus”. The program found a pattern in tumor cells that is common to different types of cancer, consisting of a characteristic combination of genes. According to a team article in the journal Genome Biology, the algorithm also detects types of genes in the model that have never been clearly linked to cancer before.
Machine learning essentially means that the algorithm uses training data to learn how to answer certain questions on its own. He does this by looking for models in the data that help him solve problems. After the training phase, the system can summarize from what it has learned to evaluate unknown data. “It was a big challenge to get relevant training data, where experts had already made a clear distinction between ‘healthy’ and ‘cancer’ cells,” said Jan Domain, the first author of the article.
Surprisingly high success rate
In addition, single-cell sequencing datasets are often noisy. This means that the information they contain about the molecular characteristics of individual cells is not very accurate – perhaps because a different number of genes are found in each cell or because samples are not always processed in the same way. According to Domain and his colleague Dr. Vedran Franke, co-leader of the study, they reviewed countless publications and contacted many research groups to obtain adequate data sets. The team eventually used data from lung cancer and colorectal cancer cells to train the algorithm before applying it to datasets for other types of tumors.
In the training phase, ikarus had to find a list of characteristic genes, which he then used to categorize the cells. “We tried and refined different approaches,” says Domain. It was a time-consuming job, as all three scientists say. “The key was for ikarus to eventually use two lists: one for cancer genes and one for genes from other cells,” Franke said. After the training phase, the algorithm was able to reliably distinguish between healthy and tumor cells in other cancers, such as in tissue samples from patients with liver cancer or neuroblastoma. His success was extremely high, which surprised even the research team. “We didn’t expect there to be a common signature that so accurately defines tumor cells in different cancers,” Akalin said. “But we still can’t say if the method works for all cancers,” Domain added. To make ikarus a reliable tool for diagnosing cancer, researchers now want to test it on additional types of tumors.
AI as a fully automated diagnostic tool
The project aims to go beyond the classification of “healthy” against “cancer” cells. In initial tests, ikarus has already demonstrated that the method can also distinguish other cell types (and certain subtypes) from tumor cells. “We want to make the approach more comprehensive,” says Akalin, “to develop it further so that it can distinguish between all possible cell types in a biopsy.”
In hospitals, pathologists tend to only examine tissue samples from tumors under a microscope to identify different cell types. This is a time consuming, time consuming job. With ikarus, this step can one day become a fully automated process. In addition, Acalin notes, the data can be used to draw conclusions about the immediate environment of the tumor. And this can help doctors choose the best therapy. Because the composition of the cancerous tissue and the microenvironment often indicates whether a particular treatment or drug will be effective or not. In addition, AI can be useful in the development of new drugs. “Ikarus allows us to identify genes that are potential drivers of cancer,” says Akalin. New therapeutic agents can then be used to target these molecular structures.
Cooperation between home and office
A remarkable aspect of the publication is that it was produced entirely during the COVID pandemic. All involved were not at their usual offices at the Berlin Institute for Biology of Medical Systems (BIMSB), which is part of the MDC. Instead, they were in the home offices, communicating only digitally. Therefore, according to Franke, “The project shows that a digital structure can be created to facilitate scientific work under these conditions.”
reference: Dohmen J, Baranovskii A, Ronen J, Uyar B, Franke V, Akalin A. Identification of tumor cells at the unicellular level using machine learning. Genome Genome. 2022; 23 (1): 123. doi: 10.1186 / s13059-022-02683-1
This article has been republished from the following materials. Note: The material can be edited for length and content. For more information, please contact the cited source.