Coded in your DNA: How Singapore can help avert a global data storage crisis

The world is facing a looming data storage crisis, and Singapore can help to avert it.

In 2018, people watched 4.33 million videos on YouTube, sent 159 million emails and posted 49,000 photographs on Instagram every minute of the year, among other data uses.

At this rate, we will produce 418 zettabytes of data this year, according to the World Economic Forum, and even more in the future.

A single zettabyte is a trillion gigabytes.

Our current methods of storing all this data are not sustainable, for several reasons. Most digital archives are now stored on magnetic and optical data storage systems, but we will run out of the materials used to produce these in less than a century if that.

Meanwhile, the environmental and economic cost of server farms, which already make up three per cent of global electricity use and two per cent of greenhouse gas emissions, will soar.

Also Read: From data novice to data expert: How tech startups can handle data privacy

‘All of YouTube in a teaspoon’

While scientists have been investigating alternative methods of storing data, one stands out. DNA-based data storage, which stores information in manmade strands of DNA, has three key advantages.

It has extremely high data storage density, remains stable for hundreds of years, and requires very little power.

In 2019, scientists in Israel announced that they had developed a way to store more than 10 petabytes, or 10 million gigabytes, in a single gram of DNA. This means that, theoretically, all of YouTube’s data could be stored in a teaspoon of DNA.

Even though scientists have been working on DNA-based data storage methods for nearly a decade, however, major obstacles remain – and this is where Singapore can play a key role.

The key challenges

First, a quick explanation of how DNA-based data storage works. Each DNA molecule consists of linked components called nucleotides, which come in four types: guanine, cystosine, adenine and thymine, represented by the letters G, C, A and T.

To store information in DNA, digital data, which consists of 0s and 1s, is translated into sequences made up of the G, C, A and T letters.

Also Read: How a data deep dive can help Asian startups succeed

Companies or other organisations then manufacture synthetic DNA molecules representing those translated sequences and store them. To retrieve the data, the synthetic DNA molecules are sequenced, and the output translated back into the original digital information.

While this method has been tried and tested, there are significant challenges. The costs of sequencing DNA has fallen dramatically in recent years. The cost of producing synthetic DNA molecules, however, is still prohibitively expensive.

Currently, it costs about US$5 million (S$6.7 million) to store just one gigabyte of data – a lot of money to store not even a full DVD movie!

Creating DNA molecules and sequencing them also involve biochemical and biophysical processes that are prone to errors. The process of writing DNA to produce the synthetic molecules, for example, is vulnerable to substitution, insertion and deletion errors.

The Singapore connection

In Singapore, several teams of researchers are hard at work on these problems.

At the National University of Singapore, Associate Professor Poh Chueh Loo, Associate Professor Yew Wen Shan, and their colleagues are working on more efficient ways to synthesise DNA sequences.

The Singapore University of Technology and Design’s Advanced Coding and Signal Processing Laboratory, where I am a visiting scholar, is another local nexus of research in the field.

Also Read: The cloud has moved mountains, but always keep an eye out for security

The laboratory, under the leadership of Associate Professor Cai Kui, its founder, has been developing algorithms to prevent, detect and correct errors in writing and sequencing DNA.

We have found, for instance, that when the same nucleotide is repeated more than four times in a row, the probability of sequencing errors rises substantially. We have also described how to design algorithms to translate data into strands of nucleotides that meet various error-limiting conditions.

Furthermore, we calculated the maximum number of data bits that can be stored per nucleotide if a constraint is imposed to prevent too many repetitions of a nucleotide in a row.

Much more work needs to be done to make DNA-based data storage viable, including in areas such as how to restore lost data. In hard disk drives, data is stored in fixed places, so even if you lose some data, the fact that you know what is supposed to go where can help you to restore the missing pieces.

A pool of DNA, however, is like coffee in a pot, with free-floating molecules. This makes data restoration much more difficult.

Still, DNA-based data storage remains one of the most promising solutions to our impending data storage crisis. And Singapore, with its vibrant research sector and excellent expertise in the sciences, is well-positioned to be a leader in this research field.
–

Editor’s note: e27 aims to foster thought leadership by publishing contributions from the community. Become a thought leader in the community and share your opinions or ideas and earn a byline by submitting a post.

Join our e27 Telegram group, or like the e27 Facebook page.

This article was first published on February 6, 2020

The post Coded in your DNA: How Singapore can help avert a global data storage crisis appeared first on e27.