Memory-Driven Computing and The Machine

Back in November I received an unusual request: to take part in a conversation at the Discover expo in London, an event put on by Hewlett Packard Enterprise (HPE) to showcase their new technologies. The occasion was a project called simply The Machine — a step forward in what’s known as “memory-driven computing.” On the one hand, I am not in any sense an expert in high-performance computing technologies. On the other hand (full disclosure alert), they offered to pay me, which is always nice. What they were looking for was simply someone who could speak to the types of scientific research that would be aided by this kind of approach to large-scale computation. After looking into it, I thought that I could sensibly talk about some research projects that were relevant to the program, and the technology itself seemed very interesting, so I agreed stop by London on the way from Los Angeles to a conference in Rome in honor of Georges Lemaître (who, coincidentally, was a pioneer in scientific computing).

Everyone knows about Moore’s Law: computer processing power doubles about every eighteen months. It’s that progress that has enabled the massive technological changes witnessed over the past few decades, from supercomputers to handheld devices. The problem is, exponential growth can’t go on forever, and indeed Moore’s Law seems to be ending. It’s a pretty fundamental problem — you can only make components so small, since atoms themselves have a fixed size. The best current technologies sport numbers like 30 atoms per gate and 6 atoms per insulator; we can’t squeeze things much smaller than that.

So how do we push computers to faster processing, in the face of such fundamental limits? HPE’s idea with The Machine (okay, the name could have been more descriptive) is memory-driven computing — change the focus from the processors themselves to the stored data they are manipulating. As I understand it (remember, not an expert), in practice this involves three aspects:

  1. Use “non-volatile” memory — a way to store data without actively using power.
  2. Wherever possible, use photonics rather than ordinary electronics. Photons move faster than electrons, and cost less energy to get moving.
  3. Switch the fundamental architecture, so that input/output and individual processors access the memory as directly as possible.

Here’s a promotional video, made by people who actually are experts.

The project is still in the development stage; you can’t buy The Machine at your local Best Buy. But the developers have imagined a number of ways that the memory-driven approach might change how we do large-scale computational tasks. Back in the early days of electronic computers, processing speed was so slow that it was simplest to store large tables of special functions — sines, cosines, logarithms, etc. — and just look them up as needed. With the huge capacities and swift access of memory-driven computing, that kind of “pre-computation” strategy becomes effective for a wide variety of complex problems, from facial recognition to planing airline routes.

It’s not hard to imagine how physicists would find this useful, so that’s what I briefly talked about in London. Two aspects in particular are pretty obvious. One is searching for anomalies in data, especially in real time. We’re in a data-intensive era in modern science, where very often we have so much data that we can only find signals we know how to look for. Memory-driven computing could offer the prospect of greatly enhanced searches for generic “anomalies” — patterns in the data that nobody had anticipated. You can imagine how that might be useful for something like LIGO’s search for gravitational waves, or the real-time sweeps of the night sky we anticipate from the Large Synoptic Survey Telescope.

The other obvious application, of course, is on the theory side, to large-scale simulations. In my own bailiwick of cosmology, we’re doing better and better at including realistic physics (star formation, supernovae) in simulations of galaxy and large-scale structure formation. But there’s a long way to go, and improved simulations are crucial if we want to understand the interplay of dark matter and ordinary baryonic physics in accounting for the dynamics of galaxies. So if a dramatic new technology comes along that allows us to manipulate and access huge amounts of data (e.g. the current state of a cosmological simulation) rapidly, that would be extremely useful.

Like I said, HPE compensated me for my involvement. But I wouldn’t have gone along if I didn’t think the technology was intriguing. We take improvements in our computers for granted; keeping up with expectations is going to require some clever thinking on the part of engineers and computer scientists.

  1. The trade off between memory and computation has always been around.. If memory cost and access time go down, there’s bound to be a shift in that direction. It doesn’t seem that new to me.

  2. Years (and years) ago, I attended a talk by Grace Hopper, who was involved in early computer development for the Navy and co-developer of common business-oriented language (COBOL). She handed out short pieces of wire that equaled the distance an electron traveled in a nanosecond to show that to speed up computers you had to make them smaller. Hopper was famous for saying, “It is easier to say you are sorry than it is to get permission.”

  3. I have a question tangentially related to the second application, cosmic simulations: how far back in time is the evolution of the Earth or Solar System determined?

    Clearly, under general relativistic simulations the future state would be determined by the past state, as the theory is fully deterministic. However, which parts of the primordial interstellar gas cloud condenses, and hence which solar system forms, is a matter of random fluctuation. Is it a thermal fluctuation that’s determined by the past, or is it a quantum fluctuation? Even if it is a classical fluctuation, going further back in time we’ll reach prior events that determined the part that would condense. How far back do we need to go before it would be a quantum fluctuation that will determine which solar system forms?

    A natural answer is that we need to go all the way back to the initial inhomogeneities left by cosmic inflation, but I suspect that’s overkill. The details of which solar system forms are probably quite sensitive to the details of the initial conidtion, I wouldn’t be suprised if the dynamics is chaotic in this respect. So even a small, quantum, perturbation of the conditions at some time well-after the big bang should be enough to cascade into a huge difference by the time the solar system forms.

  4. What a great video is it. Five minutes of beating around the bush without actually telling us anything. Exchange literally a few words and you can use it for any project. Nuclear bomb. World peace. Dyson Sphere. Ebola Spores. We worked years on it and now we got it going. Its potential is endless. We’re so proud about it.

  5. Light is only used in electronics in optic cables, used to transmit large amounts of information. That is because more signals can be sent at a time. A lot of times I feel like physicist confuse the transmission of light through the electromagnetic force (like in a magnet) for how electronic components actually work through a wire. This scenario would be more accurate when describing an inductor or switch in high power systems (which has inductors in them). An electronic signal has more to do with the rate of change of the strengths of the field itself (change in voltage). Then an electronic high has the presence of electrons and a low has an absence of electrons. The frequency of this change determines the signal. Then a signal can change from high to low almost instantaneously, like a square wave for example. Then the peak and trough of an electronic signal is measured in voltage.

    AMD is supposed to release a new architecture of chip, Zen, which has RAM (memory) built right on top of it sometimes next year. I think it will be interesting how this change will affect changes in technology. The idea is that it can store common processes so that the processor has easy access to data it needs regularly. Then the processor doesn’t have to work as hard to come up with that data anymore and just reuse the data it has already figured.

  6. I am not sure how the hardware map to the current development of MapReduce analog distributed processing on hardware agnostic virtual machines, and the need for a mobile local cloud [I can’t remember the latest term for that] around users such as persons and cars. The MDC architecture sounds like a data lake memory dump idea, except that too has moved to the cloud.

    In a sense cloud hardware is very generic. You want the most bang for the bucks. So I hear now people use graphic cards to, yes, get more memory close to the processors in big data, parallel cloud processing applications.

    Maybe MDC is something for large specialized projects like Square Kilometer Array [SKA]?

  7. A somewhat dated comment of mine, but still relevant:

    In the modern processor, the fastest component is the CPU – the central processing unit that actually performs the computations. To get the instructions and data it needs to operate, the CPU relies on a hierarchy of memory systems. This is necessary because memory that is fast — near-CPU speed — is expensive. The CPU-speed data stores are the registers on the CPU.

    The current hierarchy of memory subsystems is
    L1 cache
    L2 cache
    L3 cache
    Hard disk

    Latency measures the time it takes between the CPU making a request for something from memory and starting to get the reply. Here is an excerpt from a table from Anandtech for Intel’s latest processor, the Core i7 (Nehalem); I expect the numbers are typical for any processor running at a few gigahertz.

    L1 latency – 4 CPU cycles
    L2 latency – 11 CPU cycles
    L3 latency – 39 CPU cycles
    RAM latency – 107 CPU cycles
    Hard disk latency – approx 10.5 million cycles.

    to put the issue of latency into a human context – let us say humans effectively operate at the level of speech or writing at one cycle per second. Imagine two humans conversing. Then L1 cache is like a really slow conversation, taking 4 seconds to receive the response to a question. RAM latency is like waiting for two minutes. Hard disk latency is like waiting for a whole year!

    In this analogy, SSD {solid-state disk} latency is equivalent to roughly two days.

  8. It’s not really clear what HP has achieved. The big data they are referring to usually resides on remote machines. Maybe that is the point of the name “The Machine”. You need to get all your data on a single machine for the new memory format, optical links and architecture (whatever that is) to be useful. HP is desperate to invent something or anything. They have had the word “invent” in their corporate tag lines for years, but I don’t think anybody can name a single 21st century HP invention, despite probably 10Ks of HP patents. It’s difficult in a top down “sponsoring executive” title laden corporate culture. It may be better to build a machine to search / create start-ups really effectively and to buy your inventions that way.

  9. It’s nice to see progress on this project – the last time I looked, it was on the edge of failure as “memristor” persistent memory was not getting to production as fast as people had expected. As Arun points out above, The Machine eliminates the HD and SSD layers of the memory hierarchy, providing terabytes & petabytes of storage that run at RAM speed. If you’re old enough, you may remember how you could configure Linux and even DOS with a “ramdisk” that would eliminate the wait for data to be read in and out of floppy drives. This new architecture recreates that style of computing once again, but at 21st century scale.

    I’m no computational physicist, but I’d think that simulations would be a richer field for memory-centric computing than simple data analyses of ill-structured data. For a lot of physics simulations, all the interactions are local, and data can be streamed in from disk or from an adjacent CPU with a lot of predictable adjacent structure in the data, and this can be made quite speedy with clever data layouts. But when the data is irregular and you can’t know in advance whether the next data item you’re going to need is from nearby or far away, the wait for a faraway data item to spin around on its disk to a read/write head can be deadly. I would think of plasma physics confinement studies, where every particle in a tokamak or stellarator load is connected to every other one at nearly the speed of light, and events on one side of the ring of plasma can lead to instabilities on the other side before any instrumentation notices anything out of the ordinary. Restructuring a simulation code to avoid artificial locality created by algorithms structured to run on simple array-parallel supercomputers could lead to significant accuracy improvement.

    Or for simulations where the “spooky action at a distance” of quantum entanglement becomes significant. Simulations of the detailed interactions at the boundary between neutronium and ordinary matter at the surface of a neutron star might become feasible with this kind of computer architecture. I understand that Erilk Verlinde’s theory of gravity uses entanglement quite deeply – there may be phenomena that emerge from detailed simulations that simple bulk calculations won’t show. Likewise with the quark-gluon plasma in the CERN LHC and early stages of the big bang.

  10. “It’s nice to see progress on this project – the last time I looked, it was on the edge of failure as “memristor” persistent memory was not getting to production as fast as people had expected. As Arun points out above, **The Machine eliminates the HD and SSD layers of the memory hierarchy**, providing terabytes & petabytes of storage that run at RAM speed.”

    It does not. Memoristors are not commercially viable and will not be for a long time. The first versions of the Machine will continue to use DRAM and NAND for respectively memory and storage. (

    Fortunately, there’s a memory company that *is* planning to start production of a fast, non-volatile memory that is also cheaper than DRAM (the Intel-Micron JV), called 3D XPoint (in 2017). The nice thing about that is that the usage of 3D XPoint will not be limited to one company’s products (unlike HPE’s memristors), but you can buy the memory straight from the source and put it in your own systems.

  11. The Machine is descriptive enough for a name as long as HP thinks they are Harold Finch.
    I would be worried if they called it Samaritan. 🙂

  12. I read on the AMD website, a while back, that the biggest problem they have with keeping up with Moore’s Law is the doping process which inscibes the logic gates on the chips. It is starting to get too costly to make wafers with less errors, because they appear in the wafers randomly. Then chip manufacturers have to throw out a lot of wafers in order to only have the ones of the quality people want. Then Intel decided to develop hyperthreading which cuts process speeds in half on each bit instead of trying to develop a 64 bit chip like AMD. Meanwhile, programmers still only make 64 bit games even though we have had computers with quad cores and more for some time now. Although, Microsoft has developed a way for 64 bit programs to use more cores, but the programs get exponentially less work done with each additional core. Adding more cores won’t run the programs we currently use that much more effectively. Then the game industry is reluctant to make games which are higher end, because they are always afraid of people not having a computer that can run them; that would cut into the profits from sales. Also, it can take around 5 years or more to develop a game or program, and they design it to be used with the technology they started with at that time.