Data Systems Architecture — Past, Present and Future

Karthik Mahalingam
3 min readOct 16, 2021

One of the topics that got me curious recently was how the application architecture would evolve after Moore’s law.

Some suggest that Moore’s law has already ended, and while some suggest that our interpretation of Moore’s law is wrong and we are still keeping up with Moore’s law.

There are some merits to both arguments. Yes, we are continuing to pack more transistors into the small footprint, thanks to advancements in 3DIC and chip packaging.

Are CPUs getting any faster ?

Yes, but maybe not a whole lot. There are many factors at play , thermal efficiency, power-consumption and so on. This is one of the reason why we started offloading more tasks to GPU, FPGAs, ASIC and so on.

Implication on Application / Data systems architecture

Below diagram is the highly abstracted view of the system architecture from past to present (Distributed systems). One fact which is quite obvious, but still calling out is the architecture evolves with the eco-system where the hardware plays a vital role.

Data Systems Architecture (Past and Present)

In the past, we separated only the data. In the modern-day distributed architecture, we separated few more things, such as Redis for application state, Azure app configuration services for the configuration, cosmos, or RDS as a distributed data store. There are few good reasons for that,

  1. Config, state, and metrics management are more complex, and there are tools purposefully built to address these problems.
  2. You want to minimize the overall footprint of your application to be as small as possible.
  3. As Murphy’s law states, if anything can go wrong, it will. You want to avoid correlated failures.

… many more

Are we ready for post-Moore's law era ?

To process the massive amounts of data which is getting generated every second by, IOT devices, mobile phones, click streams and so on. We need compute! Moving compute to the data is one of the fascinating idea. Through region replication, we certainly moved compute closer to the data where it is generated, but not close enough. Our ability to process the data quickly is limited by our software stack which sits on top of multiple layers of abstraction (hypervisor, OS, container and so on).

Enter computational storage

Computational storage is one such concept, which is quite fascinating and important industry trend to follow. There are several variations of computational storage architecture, but one thing it guarantees it moves the compute closer to the data as much as possible.

Computational Storage Architecture

This helps with the services which wants to perform in-place data transformation task and enables several promising use cases such as, million tiny databases which can be used to store and query data in the storage hardware itself.

Good first step for application architect and developers

  1. Think and design service as a function scope (small and very specific) which can be deployed in millions of storage nodes.
  2. Consider cross-platform application stack. .NET Core , Java, etc.
  3. Consider serverless architecture, chances are the cloud providers will eventually transition to computational storage.
  4. Unikernels is another exciting area which is gaining lots of traction recently. (Unikernels deserves a separate post by itself which I hoping to cover in the future blog posts).

These are some of my thoughts, this blog post is by no means complete. I am hoping to share more information as I learn more, hope you find this post useful. Please like and follow.!

--

--

Karthik Mahalingam

Cloud and Distributed Computing Enthusiast with 13+ years of experience; Works @ Microsoft; Opinions are my own and not the views of my employer.