Hiển thị các bài đăng có nhãn algorithm. Hiển thị tất cả bài đăng
Hiển thị các bài đăng có nhãn algorithm. Hiển thị tất cả bài đăng

Thứ Ba, 7 tháng 3, 2017

New Computer Operating System unlock DNA's Molecules nearly full storage potential

In a study in Science, researchers Yaniv Erlich and Dina Zielinski describe a new coding technique for maximizing the data-storage capacity of DNA molecules.Credit: New York Genome Center

An algorithm designed for streaming video on a cellphone can unlock DNA's nearly full storage potential by squeezing more information into its four base nucleotides, say researchers. They demonstrate that this technology is also extremely reliable.



Humanity may soon generate more data than hard drives or magnetic tape can handle, a problem that has scientists turning to nature's age-old solution for information-storage -- DNA.

In a new study in Science, a pair of researchers at Columbia University and the New York Genome Center (NYGC) show that an algorithm designed for streaming video on a cellphone can unlock DNA's nearly full storage potential by squeezing more information into its four base nucleotides. They demonstrate that this technology is also extremely reliable.



DNA is an ideal storage medium because it's ultra-compact and can last hundreds of thousands of years if kept in a cool, dry place, as demonstrated by the recent recovery of DNA from the bones of a 430,000-year-old human ancestor found in a cave in Spain.

"DNA won't degrade over time like cassette tapes and CDs, and it won't become obsolete -- if it does, we have bigger problems," said study coauthor Yaniv Erlich, a computer science professor at Columbia Engineering, a member of Columbia's Data Science Institute, and a core member of the NYGC.

Erlich and his colleague Dina Zielinski, an associate scientist at NYGC, chose six files to encode, or write, into DNA: a full computer operating system, an 1895 French film, "Arrival of a train at La Ciotat," a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.

They compressed the files into a master file, and then split the data into short strings of binary code made up of ones and zeros. Using an erasure-correcting algorithm called fountain codes, they randomly packaged the strings into so-called droplets, and mapped the ones and zeros in each droplet to the four nucleotide bases in DNA: A, G, C and T. The algorithm deleted letter combinations known to create errors, and added a barcode to each droplet to help reassemble the files later.



In all, they generated a digital list of 72,000 DNA strands, each 200 bases long, and sent it in a text file to a San Francisco DNA-synthesis startup, Twist Bioscience, that specializes in turning digital data into biological data. Two weeks later, they received a vial holding a speck of DNA molecules.

To retrieve their files, they used modern sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary. They recovered their files with zero errors, the study reports. (In this short demo, Erlich opens his archived operating system on a virtual machine and plays a game of Minesweeper to celebrate.)

They also demonstrated that a virtually unlimited number of copies of the files could be created with their coding technique by multiplying their DNA sample through polymerase chain reaction (PCR), and that those copies, and even copies of their copies, and so on, could be recovered error-free.



Finally, the researchers show that their coding strategy packs 215 petabytes of data on a single gram of DNA -- 100 times more than methods published by pioneering researchers George Church at Harvard, and Nick Goldman and Ewan Birney at the European Bioinformatics Institute. "We believe this is the highest-density data-storage device ever created," said Erlich.

The capacity of DNA data-storage is theoretically limited to two binary digits for each nucleotide, but the biological constraints of DNA itself and the need to include redundant information to reassemble and read the fragments later reduces
its capacity to 1.8 binary digits per nucleotide base.

The team's insight was to apply fountain codes, a technique Erlich remembered from graduate school, to make the reading and writing process more efficient. With their DNA Fountain technique, Erlich and Zielinski pack an average of 1.6 bits into each base nucleotide. That's at least 60 percent more data than previously published methods, and close to the 1.8-bit limit.

Cost still remains a barrier. The researchers spent $7,000 to synthesize the DNA they used to archive their 2 megabytes of data, and another $2,000 to read it. Though the price of DNA sequencing has fallen exponentially, there may not be the same demand for DNA synthesis, says Sri Kosuri, a biochemistry professor at UCLA who was not involved in the study. "Investors may not be willing to risk tons of money to bring costs down," he said.



But the price of DNA synthesis can be vastly reduced if lower-quality molecules are produced, and coding strategies like DNA Fountain are used to fix molecular errors, says Erlich. "We can do more of the heavy lifting on the computer to take the burden off time-intensive molecular coding," he said.
Source: Materials provided by Columbia University School of Engineering and Applied Science.

YOUR INPUT IS MUCH APPRECIATED! LEAVE YOUR COMMENT BELOW.

Thứ Bảy, 18 tháng 2, 2017

The Internet and your brain are more alike than you think

Salk scientist finds similar rule governing traffic flow in engineered and biological systems. Credit: Salk Institute

A similar rule governs traffic flow in engineered and biological systems, reports a researcher. An algorithm used for the Internet is also at work in the human brain, says the report, an insight that improves our understanding of engineered and neural networks and potentially even learning disabilities.



Although we spend a lot of our time online nowadays -- streaming music and video, checking email and social media, or obsessively reading the news -- few of us know about the mathematical algorithms that manage how our content is delivered. But deciding how to route information fairly and efficiently through a distributed system with no central authority was a priority for the Internet's founders. Now, a Salk Institute discovery shows that an algorithm used for the Internet is also at work in the human brain, an insight that improves our understanding of engineered and neural networks and potentially even learning disabilities.



"The founders of the Internet spent a lot of time considering how to make information flow efficiently," says Salk Assistant Professor Saket Navlakha, coauthor of the new study that appears online in Neural Computation on February 9, 2017. "Finding that an engineered system and an evolved biological one arise at a similar solution to a problem is really interesting."
In the engineered system, the solution involves controlling information flow such that routes are neither clogged nor underutilized by checking how congested the Internet is. To accomplish this, the Internet employs an algorithm called "additive increase, multiplicative decrease" (AIMD) in which your computer sends a packet of data and then listens for an acknowledgement from the receiver: If the packet is promptly acknowledged, the network is not overloaded and your data can be transmitted through the network at a higher rate. With each successive successful packet, your computer knows it's safe to increase its speed by one unit, which is the additive increase part. But if an acknowledgement is delayed or lost your computer knows that there is congestion and slows down by a large amount, such as by half, which is the multiplicative decrease part. In this way, users gradually find their "sweet spot," and congestion is avoided because users take their foot off the gas, so to speak, as soon as they notice a slowdown. As computers throughout the network utilize this strategy, the whole system can continuously adjust to changing conditions, maximizing overall efficiency.

Navlakha, who develops algorithms to understand complex biological networks, wondered if the brain, with its billions of distributed neurons, was managing information similarly. So, he and coauthor Jonathan Suen, a postdoctoral scholar at Duke University, set out to mathematically model neural activity.



Because AIMD is one of a number of flow-control algorithms, the duo decided to model six others as well. In addition, they analyzed which model best matched physiological data on neural activity from 20 experimental studies. In their models, AIMD turned out to be the most efficient at keeping the flow of information moving smoothly, adjusting traffic rates whenever paths got too congested. More interestingly, AIMD also turned out to best explain what was happening to neurons experimentally.

It turns out the neuronal equivalent of additive increase is called long-term potentiation. It occurs when one neuron fires closely after another, which strengthens their synaptic connection and makes it slightly more likely the first will trigger the second in the future. The neuronal equivalent of multiplicative decrease occurs when the firing of two neurons is reversed (second before first), which weakens their connection, making the first much less likely to trigger the second in the future. This is called long-term depression. As synapses throughout the network weaken or strengthen according to this rule, the whole system adapts and learns.

"While the brain and the Internet clearly operate using very different mechanisms, both use simple local rules that give rise to global stability," says Suen. "I was initially surprised that biological neural networks utilized the same algorithms as their engineered counterparts, but, as we learned, the requirements for efficiency, robustness, and simplicity are common to both living organisms and the networks we have built."



Understanding how the system works under normal conditions, could help neuroscientists better understand what happens, when these results are disrupted, for example, in learning disabilities. "Variations of the AIMD algorithm are used in basically every large-scale distributed communication network," says Navlakha. "Discovering that the brain uses a similar algorithm may not be just a coincidence."
Story Source:
Materials provided by Salk Institute.

YOUR INPUT IS MUCH APPRECIATED! LEAVE YOUR COMMENT BELOW.

 
OUR MISSION