News

These AIs were designed to read sentences, but they can also catch coronavirus mutations

Natural Language Processing (NLP) algorithms have played a tremendous role in analyzing text sentiment and extracting meaning, hence why they have spurred the development of applications like chat bots and virtual assistants. However, these same algorithms have now been equipped with a surprising yet welcome ability: to generate protein sequences and predict virus mutations.

In a study published in Science today, computational biologist Bonnie Berger and her colleagues demonstrated how NLP can be used to predict mutations that allow viruses to avoid detection by antibodies in the human immune system, a process that is aptly know as viral immune escape.

As with many developments in the world of science and technology, the fundamental insight behind this one is pretty simple. As it happens, many properties of biological systems can be understood in terms of words and sentences. In this case, interpreting the protein sequence of a virus is very much like interpreting the sequence of words and characters in a sentence.

Berger’s team used two different linguistic concepts: grammar and semantics (meaning). The genetic or evolutionary fitness of a virus (characteristics such as how good it is at infecting a host) can be interpreted in terms of grammatical correctness. A successful, infectious virus is “grammatically correct”, while an unsuccessful one is not.

Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment, such as changes in its surface proteins that make it invisible to certain antibodies, have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

In order to model these properties, the team used a Long Short Term Memory (LSTM) neural network that was trained on thousands of genetic sequence taken from three different viruses: 45,000 unique sequences extracted for influenza, 60,000 for HIV, and somewhere between 3000 and 4000 for a strain of coronavirus.

Why lesser data for the coronavirus strain? According to Brian Hie, an MIT student involved in building the models, this is simply because there has been less surveillance of the virus responsible for the COVID-19 pandemic.

The ultimate aim of the team was to identify mutations that might let a virus escape the immune system without making it less infectious. In more NLP-friendly words, this means that they are trying to find mutations that change the virus’s meaning without making it grammatically incorrect.

To test their approach, the team used a common metric for assessing predictions made by machine-learning models that scores accuracy on a scale between 0.5 (no better than chance) and 1 (perfect). In this case, they took the top mutations identified by the tool and, using real viruses in a lab, checked how many of them were actual escape mutations. Their results ranged from 0.69 for HIV to 0.85 for one coronavirus strain. This is better than results from other state-of-the-art models, they say.

The utility of NLP algorithms identifying coronavirus mutations lies in the fact that hospitals and public health institutes can use the knowledge to proactively plan for the future. For instance, the algorithm can let you know how much a flu strain’s meaning has changed over a certain period of time, and knowing that can help an expert determine how well the antibodies developed by the patients’ immune systems are performing.

The team has so far been busy running their models on all kinds of coronavirus variants, including the notorious British variant, the mink mutation from Denmark, and other variants from South Africa, Singapore, and Malaysia.

Sponsored
Hamza Zakir

Platonist. Humanist. Unusually edgy sometimes.

Share
Published by
Hamza Zakir

Recent Posts

Garena Free Fire India Launch Rumors: What Fans Need to Know

Reports suggest that Garena Free Fire is set to make a much-anticipated return to India.…

9 hours ago

Albania Bans TikTok for One Year: Here’s the Reason!

The Albanian government has announced a ban on the social media platform TikTok for a…

13 hours ago

Google Pixel 9 Pro vs. 8 Pro: Biggest Upgrades Compared

The launch of Google’s latest Pixel lineup brings an exciting chance to compare the new…

15 hours ago

Azad Kashmir to Host Pakistan’s First Women-Centric Software Technology Park

ISLAMABAD: In February next year, Pakistan is set to launch its first women-focused software technology…

16 hours ago

HEC Reveals Law Admission Test Date for LLB Students

The Law Admission Test (LAT) has been announced by the Higher Education Commission (HEC) of…

16 hours ago

Meta’s WhatsApp to Release New Playback Speed Feature for Videos

Meta's WhatsApp is rolling out a new playback speed feature, allowing users to adjust video…

1 day ago