Digital transformation with streaming application development

Published in

SoftwareMill Tech Blog

9 min readJul 16, 2019

As digitalisation matures, data analytics is a part of a difficult battle for market share. If your company is not fully data-driven yet, it will be in the future in order to compete on the market. Whatever you’re selling, you have to collect quality information about your customers and tap into real-time analytics to respond quickly to fluctuating market trends.

In this post you will learn:

what are the key elements of digital transformation,
why stream processing (stream application development) is one of them,
what are the companies that are disrupting through data.

What is digital transformation?

No one disputes that organisations have more data than ever at their disposal. There are data from online transactions, online customer behaviour data, data generated by robots as a result of the Internet’s automatic search, data from different users’ online paths and different data aggregation systems, and so on. An immense volume of information is arriving at high velocity and the goal of every organisation is to analyse it and make powerful decisions based on the insights they get.

One is certain — every digital transformation is going to begin and end with the customer. Integrating different types of customer data aims at one goal — the implementation of customer-centric approach to doing business that focuses on providing a positive customer experience. The ability to leverage data from different sources is at the core of your competitiveness. Only then you can react to customers needs in real-time, as the worldwide connectivity increases and the ever-changing customer preferences and behaviours continue to evolve.

When we talk about collecting data about customers, we are typically referring to 5V characteristics of Big Data:

Volume: the amount of data being collected is huge
Velocity: the speed of data generation is rapid
Variety: there are different types of data
Veracity: the data require credibility and accuracy
Value: the actual value for decision making processes

Digital transformation aims to overcome digitalisation and big data complexity to match the speed of actions to the speed of opportunities. If the business deploys the right technology, it can respond quickly to market possibilities by harnessing insights across their organisation.

How to undergo a digital transformation?

For different businesses digital transformation can mean different things in a hybrid IT world, because we have more and more technologies to choose from, mentioning just a few: advanced analytics data aggregation systems, distributed systems like blockchain, AI and IoT systems, etc.

Regardless of the type of technology and business model you operate in, the path to digital transformation you choose for your company has to aim for adding value to every customer interaction.

You need to ask yourself a big question:

How to improve our processes to enable better decision-making and personalised customer experience?

When you figure it out, to put it simple, there are these 3 steps.

1. An implementation of data convergence to enable analytic insights impact your business operation.

2. An implementation of stream processing that will help you take the leap forward and become real-time.

I asked our expert, Integration Architect Jarosław Kijanowski, to shine more light on the technical details in the following two steps.

Jaroslaw Kijanowski - Medium

Read writing from Jaroslaw Kijanowski on Medium. Every day, Jaroslaw Kijanowski and thousands of other voices read…

medium.com

Stream processing is a big data technology that enables you to process data in motion and quickly determine what is working, what is not and what is the state of ever-changing data streams from different sources, like for example: sales leads, supply chain updates, various transactions, social media activity, customer orders, chat messages, and so on.

A stream processing application typically consists of sources, transformation logic and outputs. The sources collect data from various providers, like: Google Analytics, databases and third-party services as logging and metrics cloud platforms. This data is made available to the transformation logic in form of events. They can be processed one by one or in batches. Some of the most popular stream processing engines like Apache Kafka, Apache Beam or Apache Spark allow to perform common operations:

mapping one message to another based on content or other available data,
filtering out messages not fulfilling required criteria,
joining streams of data to produce one enriched stream,
aggregating messages and computing a single message for each given period of time.

These messages may be persisted for audit purposes or to be able to replay a given period of time with different transformation logic applied. Finally they are made available for other, typically in-house developed and hosted services. Visualising the final results allows to discover trends, setup alerts and analyse the big picture. Reporting on the other hand supports the company’s document flow.

The main selling point of using stream processing applications is their scalability. They are processing messages in real-time to allow the business, or even automated bots, to act immediately. The IT industry is already there and ready for the mass market. It’s no longer only stock exchanges, who can and must afford such systems, where trades and other transactions have to be executed with millisecond latency. Cloud providers offer the whole stack, covering mature tools for orchestrating, scaling, monitoring and tracing — all in an automated fashion.

3. Building event driven microservices architecture to enable your applications agility.

Couple of years ago there was a major shift in the way architects design IT systems. Standalone applications, so called monoliths, were no longer able to serve requests at different levels. Business demands to have their ideas implemented and shipped by the end of the day. On the development level, multiple teams need to be able to quickly grasp the idea of how a particular part of the system works while avoiding stepping on each others toes. On the devops side, it’s no longer possible to scale by simply adding more CPUs and memory.

Microservices seem to be the new black. The idea is to split a monolith into multiple modules. Features can be implemented by adding or changing the business logic in one small package. Teams take responsibility for their rather narrow areas of expertise allowing them to be independent and onboard new members quickly due to a low learning curve. Fully automated release pipelines make it possible to have new code in production within minutes. Finally scaling small and independent modules is no longer restricted by the limited power of CPUs, but is now a matter of the sheer unlimited amount of resources, available at your local cloud provider.

A microservices architecture however is not a silver bullet. The most complex topic is communication. Microservices do not have to talk to each other directly. That’s one of the main characteristics increasing availability, since the downtime of one service does not affect the whole system. Additionally an application under excessive load will not collapse, just because one of the underlying modules doing some heavy computing does not reply on time. This is a well explored pattern which relies on asynchronous communication. One module sends a message, or rather a command to an intermediate messaging layer and does not wait for a reply. It is ready to serve another request and issue another command. An underlying module which is interested in a particular type of commands picks up only selected messages and starts processing them. Once done, it sends a message, or rather a notification about the completion. Again, whoever is interested in such a notification may pick up such a message and act accordingly. There are no dependencies between modules and even if a module is down or slow, it won’t affect the rest of the system.

To introduce this pattern typically a message broker is set up. One of the most popular tools is Apache Kafka. It serves as an intermediate communication layer and decouples modules from each other. Scalability is achieved by splitting a communication channel, called topic, into multiple partitions. This allows to start new instances of a particular module up to the number of partitions. An orchestration tool like Kubernetes automates this process and scales up or down a module dependent on the number of unprocessed messages.

Another interesting aspect of a message broker providing persistence is the ability to replay all messages in case a module was not properly functioning. Unlike in real life, we get a second chance. A fixed version can start from the beginning and process all messages again.

The shift towards event driven microservice architectures enables applications agility, not only from a development and devops point of view but especially a business perspective.

Companies disrupting through data

The most successful brands embrace the fact that every customer is different and has changing preferences. Before, consumers have interacted with products and services differently, things have changed and now they are more informed and link their purchase decisions with numerous factors.

Modern consumer:

relies heavily on product recommendation and Google search, both in B2B and B2C,
is doing shopping from wherever is the most convenient — different devices and different platforms,
is more reluctant to trust brands, so achieving consumers’ loyalty becomes more challenging, as e.g. concerns about data privacy arise,
finally, wants to be treated as an individual and demands more relevancy.

Here are examples of companies that use Big Data and are winning at personalisation and optimisation game.

NIKE MERLOSE — retail

With the use of NikePlus app, Nike elevates their local consumer shopping experience. In the store located in Melrose, LA, company offers city-specific styles determined by Nike digital commerce data (buying patterns, app usage and engagement) to NikePlus members. Consumers from this area have access to exactly what they want and when they want it.

QLIK — healthcare

Qlik software is an analytical engine that allows physicians and care coordinators to tap on clinical and financial data and make data-driven, smarter decisions in regards to clinical pathways, patients cost or various diseases prevention.

LPP S.A. — eCommerce

LPP S.A. collects online customers’ data from various sources for their 5 different online stores. The two biggest inputs are: orders, stored in databases, and customers’ online behaviour recorded with Google Analytics. Massive data volumes demand not just huge storage capabilities, but also the ability to scale during peak hours 24/7. After implementation of streaming applications and real-time analytics infrastructure the company is able to compete with market leaders and leverage recommendations for personalised customer experience.

Read the case study.

UPS— logistics

UPS delivers 20.7 million packages and documents daily to people all over the world. The company uses big data: on-truck telematics and advanced algorithms to optimise fleets routes, engine idle time, and predictive maintenance. In 2012 only the company has saved over 1,5 million gallons of fuel (39 million gallons, since starting the programme in 2001) and avoided driving 12.2 million miles, significantly reducing greenhouse gas emission.

Source: https://sustainability.ups.com/media/UPS-Big-Data-Infographic.pdf

At the moment, UPS also uses big data to improve decision making across the package delivery network leveraging data from chatbots, Facebook and Amazon Echo.

UBER — carsharing

UberPool carsharing service uses patent filed by Uber in 2017 and makes driving with strangers slightly less creepy, by showing passengers friends or interests they have in common, based on their Facebook data. Plus, they are able to offer suitable and cost-effective fare rate.

NETFLIX —SVOD

Netflix Has 71% Of Global SVOD Market. More than 75% of Netflix’ user activity is driven by its recommendation system, based on the viewing habits. Users usually watch whatever Netflix suggests after login, instead of navigating to pick a movie or show.

Netflix uses Scala programming in their Big Data projects. The language goes well with the Netflix Platform and the JVM Ecosystem. It creates restful and stable API for better searching mechanisms in Netflix’s ML-based watching recommendations.

Wrap up

Every company is different, however when developing their own digital transformation strategy each of them must think of: customer experience, operational agility, culture and leadership, workforce enablement and digital technology integration.

Need a reliable tech partner to get the freedom to adapt your systems with ease to your business needs? We are experts in developing scalable, resilient and performant applications built on the solid foundations of Scala, Akka, Java, Kafka and Cassandra, among others, which powers a significant portion of business-critical applications. Contact us here!