R Systems

The Next Phase of FinOps: 3 AI-Powered Moves That Matter

Peeyush Mishra — Wed, 18 Feb 2026 11:37:42 +0000

Cloud costs rarely spiral out of control overnight. More often, they drift quietly and steadily until finance teams are left explaining overruns and engineering teams are asked to “optimize” after the fact.

This reactive approach to FinOps is becoming harder to sustain. Cloud environments today are far more dynamic than the tools and processes designed to manage them. Monthly reviews, static rules, and backward-looking reports simply cannot keep up.

This is where AI-driven FinOps steps in. Not as another dashboard, but as the next evolution of FinOps itself but one that helps teams predict what’s coming, prevent waste before it happens, and continuously improve performance.

From Cost Visibility to Cost Intelligence

Traditional FinOps gives you visibility. You can see where money is being spent, which teams own which resources, and how costs trend over time. That foundation still matters.

But visibility alone doesn’t answer the questions that really matter now:

Where is spend likely to increase next?
Which workloads are behaving differently than expected?
What should teams act on today, not at the end of the month?

AI adds intelligence to FinOps by connecting historical patterns with real-time data. Instead of just reporting on spend, AI helps teams understand why costs are changing and what to do about it.

Predict: Forecasting That Keeps Up with Change

Forecasting cloud spend has always been difficult. Usage shifts with new releases, customer demand, and infrastructure changes, often making static forecasts outdated almost as soon as they’re created.

AI-driven FinOps improves this by:

Continuously forecasting spend using live usage data
Learning from patterns like seasonality and growth trends
Adjusting predictions as workloads and architectures evolve

The result is forecasting that feels less like guesswork and more like guidance. Finance teams gain clearer budget visibility, while engineering teams better understand how their decisions shape future costs.

Prevent: Catching Anomalies Before They Become Problems

In many organizations, cost anomalies are discovered only after the bill arrives. By then, teams are already behind.

AI changes that dynamic. By learning what “normal” looks like for each workload, AI-powered FinOps tools can spot unusual behavior as it happens whether it’s a sudden traffic spike, a misconfigured autoscaling rule, or resources running idle longer than expected.

Even more important, these alerts are contextual. They don’t just flag a spike; they explain where it’s coming from and why it matters. That clarity helps teams respond faster, with less finger-pointing and fewer manual investigations.

Perform: Continuous Optimization, Not Periodic Cleanup

FinOps works best when finance and engineering operate as partners, not gatekeepers and enforcers. AI makes that collaboration easier by translating complex cost data into insights each team can act on.

With predictive insights in place:

Finance teams can focus on planning and accountability, not policing
Engineering teams can design with cost in mind, without slowing delivery
Optimization becomes ongoing, not something squeezed into quarterly reviews

Savings are identified earlier, responses are faster, and performance goals stay intact, all without adding operational overhead.

Case Study: Optimizing Petabyte-Scale Workloads for Cost and Continuity

The value of AI-driven FinOps becomes clear at scale.

A content-intelligence platform processing petabytes of data every day needed to control cloud costs without compromising performance or availability. Manual reviews and static optimization rules were no longer enough.

By introducing predictive planning and real-time anomaly detection, the organization gained early visibility into cost deviations and the ability to act before issues escalated.

The results were tangible:

20% reduction in cloud costs
Improved continuity and workload performance
Faster response times with minimal manual effort

AI didn’t just reduce spend rather it made cost management more predictable and less disruptive.
Read the full story here- Optimizing Petabyte-Scale Workloads for Cost and Continuity – R Systems

The R Systems Approach: AI-Powered FinOps, Built for Continuous Optimization

AI is powerful, but it delivers real value only when embedded into everyday cloud operations.

R Systems brings together AI-driven forecasting and anomaly detection with continuous optimization practices that align finance, engineering, and operations. The focus is not on one-time savings, but on building a FinOps capability that evolves alongside the cloud environment.

The outcome is a FinOps model that is proactive, collaborative, and resilient, designed to keep pace with both growth and change.

Explore our Cloud FinOps capabilities to learn more.

Why AI-Driven FinOps Matters Now

As cloud environments grow more complex, the cost of reacting late keeps rising. AI-driven FinOps offers a practical alternative: predict earlier, prevent waste, and perform with confidence.

For organizations that see cloud efficiency as a long-term discipline and not a quarterly exercise, there AI is no longer optional. It is foundational.

Let’s move forward together. Start the journey — talk to our Cloud FinOps experts today.

The post The Next Phase of FinOps: 3 AI-Powered Moves That Matter appeared first on R Systems.

Choosing the Right Partner: Why Agentic AI Success Depends Less on Tools and More on Who You Build With

Peeyush Mishra — Tue, 17 Feb 2026 12:02:11 +0000

Agentic AI has moved quickly from experimentation to expectation. Most enterprises today have pilots in motion, proofs of concept delivering early promise, and leadership teams asking a sharper question: How do we scale this safely, reliably, and with real business impact?

That question is often followed by fatigue. Too many pilots stall. Too many promising demos fail to survive real-world complexity. And too often, the issue isn’t the technology itself.

The uncomfortable truth is this: most agentic AI failures are not technology failures. They are partner failures.

As enterprises move from pilots to production especially within Global Capability Centers (GCCs), partner selection has become a strategic decision, not a procurement one. The difference between experimentation and enterprise value increasingly comes down to who you build with.

Why Partner Choice Matters More Than Ever

Agentic AI is fundamentally different from earlier waves of automation. It introduces autonomy into business workflows, systems that can sense, decide, and act with limited human intervention.

That kind of capability doesn’t scale through tools alone.

Scaling agentic AI requires deep enterprise context, operating-model alignment, strong governance, and ownership of outcomes. Yet many organizations still choose partners based on narrow criteria: a compelling demo, a preferred toolset, or short-term cost efficiency.

Those choices may work for pilots. They rarely work for production.

As organizations mature, a clear realization is emerging: the partner matters as much as the platform or often more.

Innovation Readiness Is Not Optional

Agentic AI is advancing faster than most enterprise operating models can comfortably absorb. New orchestration patterns, reasoning techniques, safety mechanisms, and runtime optimizations are emerging at a pace that outstrips traditional delivery and governance cycles.

In such an environment, partner capability cannot remain static. Enterprises need partners with a sustained capacity for innovation not merely the ability to implement what is already familiar.

The most effective agentic AI partners operate through a mature AI Center of Excellence: one that systematically experiments, evaluates new tools and approaches, and converts what proves viable into production-ready practices before they enter core enterprise systems.

Without this discipline, organizations risk committing too early to architectural choices that do not age well, making choices that introduce technical debt, constrain future evolution, and limit the scope of autonomy over time.

Innovation readiness in agentic AI, then, is not a matter of chasing what is new. It is the ability to distinguish signal from noise, to decide deliberately what belongs in production, and to industrialize proven approaches with consistency, safety, and repeatability.

The Common Partner Pitfalls

Most enterprises don’t choose the wrong partners intentionally. They choose partners that are right for a different stage of maturity.

Some common pitfalls we see:

Tool-first vendors who excel at showcasing AI capabilities but lack experience running mission-critical enterprise systems.
Traditional system integrators with scale and delivery muscle, but limited depth in agentic AI design and orchestration.
Niche AI firms that can build impressive pilots but struggle with integration, governance, and long-term operations.
Delivery partners focused on execution, not accountability leaving enterprises to own risk, outcomes, and scale alone.
Partners who lack domain or functional depth, resulting in agents that understand tools but not the business context, decision logic, or real operational constraints.

None of these partners are inherently flawed. But agentic AI demands a broader, more integrated capability set.

The Agentic AI Partner Readiness Checklist

Before trusting a partner to take agentic AI into production, leaders should ask a simpler, more direct question:

Can this partner scale autonomy responsibly inside my enterprise?

Here is a practical checklist to help answer that question.

1. Enterprise & GCC Readiness

Has this partner run large-scale, production systems and not just pilots?
Do they understand GCC operating models, governance structures, and decision rights?
Can they embed AI ownership into teams, not just deliver projects?

2. Agentic AI Depth

Do they go beyond chatbots and copilots?
Have they designed and deployed multi-agent systems in real environments?
Do they build in human-in-the-loop controls by default?

3. Scalability & Reusability

Do they think in platforms, not one-off agents?
Can their solutions be reused across functions and workflows?
Is observability and lifecycle management part of the design and not just an afterthought?

4. Data & Integration Maturity

Can they work with messy, legacy, enterprise data?
Do they integrate cleanly with core business systems?
Is data governance built into the solution from day one?

5. Security, Risk & Governance

Are guardrails designed in, not bolted on?
Can decisions be explained, audited, and governed?
Are solutions built for regulated, compliance-heavy environments?

6. Outcome Ownership

Are success metrics tied to business outcomes not activity?
Will the partner co-own KPIs, risk, and accountability?
Do they stay invested beyond go-live?

This checklist shifts the conversation from capabilities to credibility.

Why This Checklist Changes the Conversation

Used well, this framework changes how enterprises approach agentic AI adoption.

It shifts the focus from vendors to partners, from pilots to platforms, and from experiments to operating models.

It also makes one thing clear: scaling agentic AI is not a one-time implementation. It is a capability that must be built, governed, and evolved over time.

Organizations that succeed tend to work with partners who understand enterprise realities, operate comfortably inside GCC environments, and engineer autonomy with accountability at the core.

That is where agentic AI becomes sustainable.

The Partner as a Force Multiplier

Agentic AI is not a shortcut. It is a long-term capability play.

The right partner accelerates scale, reduces risk, and protects ROI by ensuring that autonomy is introduced not with disruption but with discipline.

The wrong partner adds complexity, creates fragility, and leaves enterprises managing outcomes they never fully owned.

As leaders move from pilots to production, the question is no longer whether agentic AI can deliver value.

It is whether you have the right partner to deliver it at scale, in the real world, and over time.

Why Domain & Functional Context Make or Break Agentic AI

Agentic AI systems do not simply automate tasks, they make decisions inside business workflows. That makes domain and functional context non-negotiable.

An agent operating in finance, supply chain, customer service, or engineering must understand far more than APIs and prompts. It must respect process boundaries, exception handling, regulatory constraints, and the implicit rules humans apply every day.

Partners without functional or industry depth often build agents that technically work but fail operationally, producing decisions that are correct in isolation yet wrong in context.

The most effective partners combine agentic AI engineering with deep functional understanding, enabling agents to operate with judgment, not just intelligence.

The post Choosing the Right Partner: Why Agentic AI Success Depends Less on Tools and More on Who You Build With appeared first on R Systems.

Less Automation, More Trust: Why Tier-2 Operators Should Start Small with AI

dhiraj — Tue, 03 Feb 2026 11:29:47 +0000

Every few months, someone in the telecom space claims that the self-healing network is just around the corner. This has been happening for years. Yet, many regional operators are still handling incidents manually, with their engineers triaging alarms and switching between legacy dashboards and SNMP traps.

And the problem isn’t that operators lack ambition, or the drive for change – it’s that they don’t trust automation enough. That’s because they’ve learned, often the hard way, that even the smallest glitch can take a stable network down in seconds. This brings us to the real barrier to AI adoption in network operations, not technology, but trust. And honestly, that’s a rational response.

AI’s first job is to earn engineers’ trust, not to replace them

Most automation stories start from an ideal scenario: clean data, cloud-native infrastructure, and teams fluent in DevOps and data science. However, that’s not the reality for most Tier-2 operators. These are lean teams running multi-vendor environments, juggling with limited budgets and decades-old systems.

After over 20 years in telecom, at R Systems we’ve worked with operators who’ve run anomaly detection pilots that technically worked but stayed in read-only mode for months because no one in the Network Operations Center (NOC) trusted the system enough to act on its recommendations. That’s rather a failure of design philosophy, than AI. The automation model might be perfect, but if the trust is low, it won’t go live.

That’s why your first automation should first build trust and then trigger growth and digital transformation. It doesn’t need to be “zero-touch” solution. It needs to be safe and reversible, because engineers trust what they can override.

Start where failure costs are low and wins are visible

From what I’ve seen in most Tier-2 operators, about half the workload of their NOC comes from low-impact, repetitive incidents, like interface flaps, link degradations, or simple routing resets.

These are the perfect starting points for AI. They happen often enough for models to learn quickly, and even if something goes wrong, the impact is minimal. Automating such tasks can cut alert fatigue dramatically, without touching high-risk infrastructure. The goal isn’t to replace engineer teams, but to help them focus on innovation and growth, while allowing AI to handle high-frequency, low-risk tasks.

Reversible automation builds confidence, one task at a time

Every successful small automation builds political capital for bigger steps. Operators gain confidence when they see an AI system take on simple, reversible tasks and get them right.

Features like explain-why outputs, detailed logs, and one-click rollbacks allow engineers to stay in control. This “supervised automation” mindset is how AI earns its place in runbooks and not the other way around. Because when the NOC team feels that AI is a partner, not a blocker, adoption accelerates naturally.

AI in the NOC: how your first 90 days will look like

If you’re wondering where to start, here’s what’s worked in practice:

Step 1: Identify your top 10 high-frequency, low-risk runbooks.

Work with your NOC managers and subject matter experts to pinpoint repetitive incident types that drain the most time.

Step 2: Roll out AI in read-only mode.

Have the Ops / DevOps teams use it for auto-diagnosis and ticket enrichment. This builds trust with zero risk.

Step 3: Move to supervised automation with rollback options.

Let the AI recommend and occasionally execute known-safe actions, with human oversight, to reduce MTTS and false-positive rates.

If you follow this sequence, you can realistically target a 20–30% reduction in incident triage time within 12 weeks, without ever touching core routing policies.

What success looks like

A regional fiber ISP ran a small pilot with AI-based anomaly detection on its edge routers. Before the pilot, the six-person NOC was logging 15+ manual tickets every night.

After the AI grouped and labeled similar alarms automatically, that number dropped to just four incidents requiring human confirmation. The mean time to resolution (MTTR) went down by 28%.

That’s not science fiction, it’s what happens when trust comes before automation.

“Start Small” isn’t playing small

Some leaders worry that starting with small, reversible AI automations means they’ll fall behind the big players. Actually, it’s the other way around. Tier-1s often spend years (and millions) chasing “autonomous” dreams, but you can deliver measurable value in 90 days with a laptop, good logs, and the right mindset.

The key is to think of AI not as a leap of faith, but as a series of safe, reversible steps that gradually earn your confidence and your engineers’.

Because the truth is, AI doesn’t need to replace the human operator to transform the NOC. It just needs to make their 2 a.m. shift a little quieter, a little smarter, and a lot more human.

The post Less Automation, More Trust: Why Tier-2 Operators Should Start Small with AI appeared first on R Systems.

The Insurance Analytics Stack: Future-Proofing Your Investments in BI Tools

dhiraj — Tue, 27 Jan 2026 10:32:30 +0000

We have seen the same pattern repeat across insurance clients more times than we can count: a significant investment in a “strategic” BI platform, followed by growing frustration just a few years later. The dashboards still run, but the platform starts to feel heavy. Costs increase. New data sources take longer to onboard. Regulatory requirements evolve faster than the analytics stack can adapt.

For data and BI leaders in insurance, this is not a hypothetical scenario — it’s a familiar one.

The reality is simple: BI tools age faster than most organizations anticipate. Data volumes grow exponentially, operating models change, and regulatory goalposts continue to shift. In our experience at R Systems, the challenge is rarely the BI tool itself; it’s how tightly business logic, governance, and skills are coupled to that tool.

The Reality of Today’s Insurance BI Landscape

There is no such thing as a perfect BI tool — only the right tool for a given context. And in insurance, that context is constantly evolving.

Over the last decade, our teams have worked across a wide spectrum of analytics environments, from mainframe-driven reporting to cloud-native, AI-enabled platforms. Insurance organizations bring unique complexity to this journey: legacy core systems, fragmented actuarial and claims data, strict compliance requirements, and constant pressure to deliver more insight with fewer resources.

Most insurers still rely on a familiar set of BI platforms:

MicroStrategy
Tableau
Qlik
Oracle BI
And increasingly, Power BI

What we see most often is not a clean replacement of one tool with another, but a multi-tool landscape where new platforms are introduced alongside existing ones. This coexistence phase is where long-term success — or failure — is determined.

The biggest mistake organizations make is assuming that today’s “strategic BI choice” will remain optimal as business priorities, data platforms, and regulatory expectations evolve.

A Candid View of the Major BI Platforms in Insurance

MicroStrategy
We’ve seen MicroStrategy perform extremely well in large insurance environments that demand strong governance, complex security models, and predictable enterprise reporting. It scales reliably and meets regulatory expectations.
At the same time, it can feel restrictive for agile analytics or rapid experimentation, especially when business users seek faster self-service capabilities.

Tableau
Tableau consistently drives high adoption due to its intuitive visual experience. Actuaries, underwriters, and analysts value the ability to explore data quickly and independently.
Where insurers often struggle is governance at scale — particularly as data sources proliferate and business logic fragments across workbooks. Without strong discipline, performance and lineage challenges emerge.

Qlik
Qlik is often underestimated in insurance contexts. Its associative model excels in ad hoc exploration, especially for claims analysis, fraud detection, and investigative use cases.
Challenges tend to arise in deeply governed enterprise scenarios or where long-term extensibility and integration with modern data platforms are priorities.

Oracle BI
Oracle BI remains a common choice for insurers heavily invested in Oracle ecosystems. It offers robust security and strong integration.
However, innovation cycles can be slower, and business-user agility is often limited. Many teams rely on it out of necessity rather than preference.

Power BI and Its Growing Role
Power BI has become a significant part of the insurance analytics conversation. Its integration with modern data platforms such as Databricks and Snowflake, improving enterprise governance, and rapidly evolving AI capabilities have made it a strategic option for many insurers.

In practice, we frequently see Power BI introduced alongside existing BI platforms — supporting executive reporting, self-service analytics, embedded use cases, or AI-driven insights — rather than as an immediate replacement. This coexistence reinforces the need for a flexible, decoupled architecture.

The Hidden Risk: Where Business Logic Lives

Across migrations and modernization programs, one risk appears repeatedly: deeply embedded business logic inside BI semantic layers.

When regulatory calculations, actuarial formulas, and financial metrics are hard-coded into a specific BI tool:

Migrations become slow and expensive
Parallel runs are difficult to validate
Flexibility disappears during mergers, acquisitions, or platform shifts

At that point, the BI tool stops being a presentation layer and becomes a structural constraint.

Five Questions We Use to Future-Proof Insurance BI Decisions

Based on our delivery experience, we encourage insurance BI leaders to ask five critical questions before making — or renewing — a BI investment:

How easily can BI tools be swapped or augmented as strategies and vendors change?
Rigid architectures increase risk during integrations and modernization efforts.

Can governance models evolve with regulatory and data privacy demands?
Many BI failures stem from brittle access controls and manual processes.

How well does the BI layer integrate with modern data platforms and AI services?
Cloud-native and AI-enabled analytics are no longer optional.

How is the balance managed between self-service and enterprise control?
Too much freedom leads to chaos; too much control drives shadow IT.

Are investments being made in skills and architecture, not just licenses?
Tools change, but strong teams and sound design principles endure.

Lessons Learned From Real Programs

In one engagement, we supported an insurer migrating from Oracle BI to Jasper to improve operations. While the target state made sense, a significant amount of critical logic was embedded in Oracle’s semantic layer. Rebuilding these calculations extended the program timeline by nearly 40%.

In contrast, we’ve worked with insurers who deliberately decoupled their transformation and metric layers from the BI tool. When licensing or strategic priorities shifted, they were able to introduce Power BI with minimal disruption. That architectural choice saved months of effort and reduced long-term risk.

Trends Insurance BI Teams Can No Longer Ignore

Across recent insurance RFPs and transformation programs, several patterns are now consistent:

Cloud-native data platforms (Databricks, Snowflake, BigQuery)
Power BI and embedded analytics for agents, partners, and customers
AI-driven insights and natural language querying
Data mesh and data fabric operating models

These are no longer emerging trends — they are current expectations.

The post The Insurance Analytics Stack: Future-Proofing Your Investments in BI Tools appeared first on R Systems.

Driving Intelligence Across a Leading German Automotive Manufacturer’s Operations with AI-Powered Forecasting

dhiraj — Tue, 27 Jan 2026 05:26:00 +0000

Enterprise AI Forecasting Framework – Designed and deployed a centralized, modular AI/ML forecasting architecture to unify forecasting across Finance, Logistics, Procurement, and Sales, replacing fragmented, manual processes with a single source of truth.
Accuracy & Predictive Depth – Achieved up to 80% forecast accuracy across freight costs, transport lead times, and sales, with <20% MAPE for daily and weekly bank balance forecasts—delivering reliable short- and long-term visibility across business functions.
Operational Efficiency at Scale – Automated end-to-end forecasting pipelines, significantly reducing manual effort, minimizing human error, and enabling monthly forecast updates with minimal retraining overhead.
Actionable Business Intelligence – Enabled finance, sales, and logistics teams with real-time, role-specific dashboards to support proactive cash flow management, inventory planning, shipment prioritization, and demand-led decision-making.
Modularity, Scalability & Reuse – Implemented a reusable forecasting framework supporting both univariate and multivariate models, allowing rapid extension to new business use cases, profit centers, and data sources without architectural rework.
Strategic Business Impact – Improved planning precision, strengthened cross-functional alignment, and established a scalable AI foundation to support ongoing digital transformation and enterprise-wide forecasting maturity.

The post Driving Intelligence Across a Leading German Automotive Manufacturer’s Operations with AI-Powered Forecasting appeared first on R Systems.

AI-Powered Multimodal Fusion for Health Risk Prediction

dhiraj — Tue, 13 Jan 2026 08:44:28 +0000

Predict Health Risks Before They Become Diagnoses

Chronic diseases like diabetes, cancer, and heart conditions often get detected too late. But what if early warning signals were already hidden inside your EMR data?

Our POV on AI-Powered Multimodal Fusion reveals how healthcare providers can move from reactive treatment to proactive, data-driven, and explainable risk prediction, without the need for advanced imaging or expensive diagnostics.

Why This POV Is a Must-Read

Healthcare organizations are sitting on enormous amounts of clinical data but very little of it works together. Our POV uncovers how multimodal AI bridges these silos to deliver:

Earlier detection of diabetes, cancer, and cardiovascular risks
Explainable health insights powered by SHAP and attention mechanisms
Seamless integration with existing EMR systems
Improved clinical decision-making using data you already have
Better population health, lower long-term costs

Who Shouldn’t Miss to Read This POV

Hospital & clinical leaders
Digital health innovators
EMR/HealthTech product owners
Population health & payer strategy teams

If early risk detection, preventive care, and explainable AI are priorities, this POV will equip you with high-impact insights.

The post AI-Powered Multimodal Fusion for Health Risk Prediction appeared first on R Systems.

From Connected to Intelligent: The Evolution of Smart Homes

dhiraj — Wed, 03 Dec 2025 13:51:51 +0000

Overview:

From futuristic speculation to everyday reality, smart homes can go way beyond connected devices – they can become intelligent, collaborative, reactive and adaptable environments. This can be achieved using Multi-agent AI Systems (MAS) to unify IoT devices and lay a solid foundation for innovation, for more seamless and secure living.

This remarkable growth of smart homes brings both opportunities and challenges. In this whitepaper, we’ll explore both, moving from the general – market overview and predictions, to specific – blueprint architecture and use cases, using AWS Harmony.

Here’s a breakdown of the whitepaper:

The Smart Homes market landscape: what is the current state and changes to expect
Multi-Agent AI Systems (MAS): how they work and why they’re transforming Smart Homes
The technology behind MAS: capabilities, practical applications and benefits
Smart Homes on AWS Harmony: blueprint of Agentic AI as the foundation for next-gen experiences
Use case for sustainable living: a hybrid Edge + Cloud IoT high-level architecture to implement for energy saving

The post From Connected to Intelligent: The Evolution of Smart Homes appeared first on R Systems.

If You Pity Yourself, Others Will Too – Jyoti’s Story of Resilience and Determination

dhiraj — Wed, 03 Dec 2025 12:59:18 +0000

We are proud to share that Jyoti Dash, our General Manager – Operations, was featured in Times of India on International Day of Persons with Disabilities, sharing her inspiring journey of resilience, determination, and growth.

Under the powerful heading “If you pity yourself, others will too,” Jyoti shared her story:

“I’m physically challenged, and growing up, that made me extremely shy because of which I faced bias early on, whether it was being excluded from school annual functions, sports days, or never being considered for roles like class monitor or head girl. These moments stayed with me, but discovering the arts helped me slowly find my place. Winning several medals taught me that if I put myself out there, I could be seen for my talent and not my disability. When I stepped into the professional world, the bias continued. My first job interview rejected me because they assumed I wouldn’t even be able to type on a computer. I sat for multiple interviews before finally getting selected, but even then, I often remained at entry-level roles because people doubted my leadership potential. My biggest turning point came early at R Systems when I was trusted with a project that required me to travel alone to the US for three months. Being on my own, without anyone to lean on, made me stronger. Soon after, I was given the opportunity to lead a new project that began with just seven people and has today grown to around sixty. Every step of this journey reinforced one important lesson: keep learning. Whether professionally or personally, continuous upskilling has always been my way forward. Most importantly, I learned never to pity myself. The moment I pity myself, I give others permission to do the same.”

We are fortunate to have Jyoti as part of the R Systems team. Her journey with us, from being trusted with that pivotal solo project in the US to leading a team that has grown from seven to sixty members, exemplifies what’s possible when talent is recognized and nurtured without bias. Jyoti’s leadership, dedication, and continuous drive for excellence inspire all of us every day.

At R Systems, we remain deeply committed to our Diversity, Equity, and Inclusion principles. Jyoti’s story reminds us why this commitment matters, both as policy and in practice. We’re glad that she found her place with us, and we will continue working to ensure that every team member can be seen for their talent, grow without barriers, and lead with their full potential.

The post If You Pity Yourself, Others Will Too – Jyoti’s Story of Resilience and Determination appeared first on R Systems.

The Next Frontier in Telecom: How AI Is Reimagining Network Intelligence, Security, and Customer Experience

Peeyush Mishra — Thu, 20 Nov 2025 13:28:24 +0000

For decades, telecom innovation has been about connecting people faster, clearer, and more reliably. But today, we’re entering a new era – one where machines can understand people, not just connect them.

Artificial Intelligence (AI) is rapidly transforming telecom networks into intelligent ecosystems that learn, predict, and act. And for Communications and Service Delivery Platform (CSP and SDP) providers, this shift represents a strategic turning point.

At our recent presentation for industry peers, Bogdan Tudan, VP of Telecom, Media & Entertainment explored what’s possible when AI moves from being an “add-on” to becoming an embedded intelligence layer in telecom systems. From self-designing IVRs to fraud-blocking digital guardians, the impact is profound.

Let’s unpack what this means in real-world terms.

1. From Code to Conversation: The Evolution of Call Flow Design

Not long ago, building or updating an IVR (Interactive Voice Response) system was a slow, technical process. You’d discuss call flows with operators, wait days for implementation, and repeat the entire cycle for every minor change.

Today, thanks to Service Delivery Platforms (SDPs), that’s ancient history. Enterprises can already log in, design their own routing logic through a self-care interface, and deploy it instantly.

But what if that process became even simpler — as natural as talking to a colleague?

Imagine designing your call flow not by dragging boxes or reading manuals, but by telling an AI assistant what you want. “Route all calls in Spanish to our Madrid team,” or “Play a service outage message for customers in Zone 4.”

The AI would understand your intent, configure the flow, and show you the result instantly — all while retaining the option to fine-tune manually.

This is where telecom UX meets generative AI (GenAI): making configuration conversational, intuitive, and intelligent.

2. Turning Data into Dialogue: AI-Driven Insights and Optimization

Once the AI assistant knows your call structure, it can go a step further: analyze how well it’s performing.

How many callers reach the right destination?
Where do most calls drop?
Are certain menus confusing customers?

With AI, you don’t just get data — you get recommendations. The system can proactively suggest improvements, much like a digital operations coach.

Consider this scenario: a fiber outage hits a local area. Traditionally, your support lines would flood with calls. But now, you simply tell your AI assistant, “Announce that our team is fixing the issue and service will resume by 5 PM.”

Within seconds, every incoming caller hears a calm, professional update. No manual reconfiguration. No waiting. Just real-time, automated customer care — powered by natural language and intelligent automation.

3. Fighting Fraud with Intelligent Guardians

Of course, telecom isn’t just about connection and convenience — it’s about trust. And that trust is under siege.

Every year, U.S. operators face more than 50 billion scam calls, resulting in over $39 billion in estimated losses. Globally, the threat landscape is just as alarming.

Traditional fraud management tools on SDPs already help — flagging suspicious patterns, blocking one-ring scams, and filtering spoofed calls. But they’re inherently reactive.

So what if AI could listen and understand — in real time?

We’re experimenting with “AI security agents” that monitor flagged calls and detect suspicious behavior based on conversation context. For example:

“May I have your PIN to verify a transaction?”

In that instant, the AI recognizes a likely scam attempt and can respond in multiple ways:

Block the call outright.
Whisper a warning to the user (“This doesn’t sound like a legitimate bank request”).
Flag and record the incident for operator review.

Because AI agents would only monitor suspicious calls — less than 1% of total network traffic — the approach is both scalable and cost-efficient. It’s proactive fraud prevention with minimal processing overhead.

This isn’t science fiction. Several European operators are already piloting AI-embedded gateways that can do precisely this. Within 6–12 months, such solutions could be commercially available — and represent a new revenue stream for security-conscious operators.

4. Outsmarting Scammers — Literally

One of our favorite examples comes from a UK operator who took a brilliantly creative approach to scam prevention.

When a scam call was detected, instead of simply dropping it, the system redirected the call to an AI-generated persona — a cheerful “grandmother” who would keep the scammer talking endlessly.

This conversational decoy wasted the scammer’s time and resources while protecting real customers. The longest recorded call? 15 minutes.

Sometimes, intelligence doesn’t just stop bad behavior — it makes it unprofitable.

5. The Road Ahead: AI as a Telecom Multiplier

AI’s potential in telecom extends far beyond automation. It’s about embedding understanding and context into every network layer:

Intelligent call routing that designs itself.
Predictive maintenance and self-healing systems.
AI-driven fraud and risk detection.
Conversational analytics for customer experience.

As generative models mature, we’ll see CSPs and SDPs evolve into adaptive service ecosystems — networks that not only deliver connectivity but continuously learn and optimize.

At R Systems, we see AI not as a technology trend, but as the next step in digital product engineering for telecom. By merging GenAI, SDP capabilities, and domain expertise, we’re helping operators move from reactive operations to predictive intelligence — and from service providers to true experience orchestrators.

Because in the future of telecom, machines won’t just connect us.
They’ll understand us.

The post The Next Frontier in Telecom: How AI Is Reimagining Network Intelligence, Security, and Customer Experience appeared first on R Systems.

The Data Lake Revolution: Unleashing the Power of Delta Lake

dhiraj — Wed, 19 Nov 2025 00:00:00 +0000

Once upon a time, in the vast and ever-expanding world of data storage and processing, a new hero emerged. Its name? Delta Lake. This unsung champion was about to revolutionize the way organizations handled their data, and its journey was nothing short of remarkable.

The Need for a Data Savior

In this world, data was king, and it resided in various formats within the mystical realm of data lakes. Two popular formats, Parquet and Hive, had served their purposes well, but they harbored limitations that often left data warriors frustrated.

Enterprises faced a conundrum: they needed to make changes, updates, or even deletions to individual records within these data lakes. But it wasn’t as simple as it sounded. Modifying schemas was a perilous endeavor that could potentially disrupt the entire data kingdom.

Why? Because these traditional table formats lacked a vital attribute: ACID transactions. Without these safeguards, every change was a leap of faith.

The Rise of Delta Lake

Amidst this data turmoil, a new contender emerged: Delta Lake. It was more than just a format; it was a game-changer.

Delta Lake brought with it the power of ACID transactions. Every data operation within the kingdom was now imbued with atomicity, consistency, isolation, and durability. It was as if Delta Lake had handed data warriors an enchanted sword, making them invincible in the face of chaos.

But that was just the beginning of Delta Lake’s enchantment.

The Secrets of Delta Lake

Delta Lake was no ordinary table format; it was a storage layer that transcended the limits of imagination. It integrated seamlessly with Spark APIs, offering features that left data sorcerers in awe.

Time Travel: Delta Lake allowed users to peer into the past, accessing previous versions of data. The transaction log became a portal to different eras of data history.
Schema Evolution: It had the power to validate and evolve schemas as data changed. A shapeshifter of sorts, it embraced change effortlessly.
Change Data Feed: With this feature, it tracked data changes at the granular level. Data sorcerers could now decipher the intricate dance of inserts, updates, and deletions.
Data Skipping with Z-ordering: Delta Lake mastered the art of optimizing data retrieval. It skipped irrelevant files, ensuring that data requests were as swift as a summer breeze.
DML Operations: It wielded the power of SQL-like data manipulation language (DML) operations. Updates, deletes, and merges were but a wave of its hand.

Delta Lake’s Allies

Delta Lake didn’t stand alone; it forged alliances with various data processing tools and platforms. Apache Spark, Apache Flink, Presto, Trino, Hive, DBT, and many others joined its cause. They formed a coalition to champion the cause of efficient data processing.

In the vast landscape of data management, Delta Lake stands as a beacon of innovation, offering a plethora of features that elevate your data handling capabilities to new heights. In this exhilarating adventure, we’ll explore the key features of Delta Lake and how they triumph over the limitations of traditional file formats, all while embracing the ACID properties.

ACID Properties: A Solid Foundation

In the realm of data, ACID isn’t just a chemical term; it’s a set of properties that ensure the reliability and integrity of your data operations. Let’s break down how Delta Lake excels in this regard.

A for Atomicity: All or Nothing

Imagine a tightrope walker teetering in the middle of their performance—either they make it to the other side, or they don’t. Atomicity operates on the same principle: either all changes happen, or none at all. In the world of Spark, this principle often takes a tumble. When a write operation fails midway, the old data is removed, and the new data is lost in the abyss. Delta Lake, however, comes to the rescue. It creates a transaction log, recording all changes made along with their versions. In case of a failure, data loss is averted, and your system remains consistent.

C for Consistency: The Guardians of Validity

Consistency is the gatekeeper of data validity. It ensures that your data remains rock-solid and valid at all times. Spark sometimes falters here. Picture this: your Spark job fails, leaving your system with invalid data remnants. Consistency crumbles. Delta Lake, on the other hand, is your data’s staunch guardian. With its transaction log, it guarantees that even in the face of job failure, data integrity is preserved.

I for Isolation: Transactions in Solitude

Isolation is akin to individual bubbles, where multiple transactions occur in isolation, without interfering with one another. Spark might struggle with this concept. If two Spark jobs manipulate the same dataset concurrently, chaos can ensue. One job overwrites the dataset while the other is still using it—no isolation, no guarantees. Delta Lake, however, introduces order into the chaos. Through its versioning system and transaction log, it ensures that transactions proceed in isolation, mitigating conflicts and ensuring the data’s integrity.

D for Durability: Unyielding in the Face of Failure

Durability means that once changes are made, they are etched in stone, impervious to system failures. Spark’s Achilles’ heel lies in its vulnerability to data loss during job failures. Delta Lake, however, boasts a different tale. It secures your data with unwavering determination. Every change is logged, and even in the event of job failure, data remains intact—a testament to true durability.

Time Travel: Rewriting the Past

Now, let’s embark on a fascinating journey through time. Delta Lake introduces a feature that can only be described as “time travel.” With this feature, you can revisit previous versions of your data, just like rewinding a movie. All of this magical history is stored in the transaction log, encapsulated within the mystical “_delta_log” folder. When you write data to a Delta table, it’s not just the present that’s captured; the past versions are meticulously preserved, waiting for your beck and call.

In conclusion, Delta Lake emerges as the hero of the data world, rewriting the rules of traditional file formats and conquering the challenges of the ACID properties. With its robust transaction log, versioning system, and the ability to traverse time, Delta Lake opens up a new dimension in data management. So, if you’re on a quest for data reliability, integrity, and a touch of magic, Delta Lake is your trusted guide through this thrilling journey beyond convention.

More Features of Delta Lake Are:

UPSERT
Schema Evolution
Change Data Feed
Data Skipping with Z-ordering
DML Operations

The Quest for Delta Lake

Setting up Delta Lake was like embarking on a quest. Data adventurers ventured into the cloud, AWS, GCP, Azure, or even their local domains. They armed themselves with the delta-spark spells and summoned the JARs of delta-core, delta-contrib, and delta-storage, tailored to their Spark versions.

Requirements:

Python
Delta-spark
Delta jars

You can configure in a Spark session and define the package name so it will be downloaded at the run time. As I said, I am using Spark version 3.3. We will require these things: delta-core, delta-contribs, delta-storage. You can download them from here: https://github.com/delta-io/delta/releases/

To use and configure various cloud storage options, there are separate .jars you can use: https://docs.delta.io/latest/delta-storage.html. Here, you can find .jars for AWS, GSC, and Azure to configure and use their data storage medium.

Run this command to install delta-spark first:

pip install delta-spark

(If you are using dataproc or EMR, you can install this while creating cluster as a startup action, and if you are using serverless env like Glue or dataproc batches, you can create docker build or pass the .whl file for this package.)

You must also do this while downloading the .jar. If it is serverless, download the .jar, store it in cloud storage, like S3 or GS, and use that path while running the job. If it is a cluster like dataproc or EMR, you can download this on the cluster.

One can also download these .jars at the run time while creating the Spark session as well.

Now, create the Spark session, and you are ready to play with Delta tables.

Environment Setup

How do you add the Delta Lake dependencies to your environment?

You can directly add them while initializing the Spark session for Delta Lake by passing the specific version, and these packages or dependencies will be downloaded during run time.
You can place the required .jar files in your cluster and provide the reference while initializing the Spark session.
You can download the .jar files and store them in cloud storage, and you can pass them as a run time argument if you don’t want to download the dependencies on your cluster.

# Initialize Spark Session
import pyspark
from delta import *
from pyspark.sql.types import *
from delta.tables import *
from pyspark.sql.functions import *

builder = pyspark.sql.SparkSession.builder.appName("My App") \
      .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \


         .config("spark.jars.packages", "io.delta:delta-core_2.12:2.2.0") \



# or if jar is already there

builder = pyspark.sql.SparkSession.builder.appName("My App") \
      .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")




spark = builder.getOrCreate()

# Initialize Spark Session
import pyspark
from delta import *
from pyspark.sql.types import *
from delta.tables import *
from pyspark.sql.functions import *

builder = pyspark.sql.SparkSession.builder.appName("My App") \
      .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \


         .config("spark.jars.packages", "io.delta:delta-core_2.12:2.2.0") \



# or if jar is already there

builder = pyspark.sql.SparkSession.builder.appName("My App") \
      .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")




spark = builder.getOrCreate()

You have to add the following properties to use delta in Spark:-

Spark.sql.extensions
Spark.sql.catalog.spark_catalog

You can see these values in the above code snippet. If you want to use cloud storage like reading and writing data from S3, GS, or Blob storage, then we have to set some more configs as well in the Spark session. Here, I am providing examples for AWS and GSC only.

The next thing that will come to your mind: how will you be able to read or write the data into cloud storage?

For different cloud storages, there are certain .jar files available that are used to connect and to do IO operations on the cloud storage. See the examples below.

You can use the above approach to make this .jar available for Spark sessions either by downloading at a run time or storing them on the cluster itself.

AWS

spark_jars_packages = “com.amazonaws:aws-java-sdk:1.12.246,org.apache.hadoop:hadoop-aws:3.2.2,io.delta:delta-core_2.12:2.2.0”

spark = SparkSession.builder.appName(‘delta’)
.config(“spark.jars.packages”, spark_jars_packages)
.config(“spark.sql.extensions”, “io.delta.sql.DeltaSparkSessionExtension”)
.config(“spark.sql.catalog.spark_catalog”, “org.apache.spark.sql.delta.catalog.DeltaCatalog”)
.config(‘spark.hadoop.fs.s3a.aws.credentials.provider’, ‘org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider’)
.config(“spark.hadoop.fs.s3.impl”, “org.apache.hadoop.fs.s3a.S3AFileSystem”)
.config(“spark.hadoop.fs.AbstractFileSystem.s3.impl”, “org.apache.hadoop.fs.s3a.S3AFileSystem”)
.config(“spark.delta.logStore.class”, “org.apache.spark.sql.delta.storage.S3SingleDriverLogStore”)

spark = builder.getOrCreate()

‍

GCS

spark_session = SparkSession.builder.appName(‘delta’).builder.getOrCreate()

‍

spark_session.conf.set(“fs.gs.impl”, “com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem”)

spark_session.conf.set(“spark.hadoop.fs.gs.impl”, “com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem”)

spark_session.conf.set(“fs.gs.auth.service.account.enable”, “true”)

spark_session.conf.set(“fs.AbstractFileSystem.gs.impl”, “com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS”)

spark_session.conf.set(“fs.gs.project.id”, project_id)

spark_session.conf.set(“fs.gs.auth.service.account.email”, credential[“client_email”])

spark_session.conf.set(“fs.gs.auth.service.account.private.key.id”, credential[“private_key_id”])

spark_session.conf.set(“fs.gs.auth.service.account.private.key”, credential[“private_key”])‍

Write into Delta Tables: In the following example, we are using a local system only for reading and writing the data into and from delta lake tables.

Data Set Used: https://media.githubusercontent.com/media/datablist/sample-csv-files/main/files/organizations/organizations-100.zip

For reference, I have downloaded this file in my local machine and unzipped the data:

df = spark.read.option("header", "true").csv("organizations-100.csv")
df.write.mode('overwrite').format("delta").partitionBy(partition_keys).save("./Documents/DE/Delta/test-db/organisatuons")

df = spark.read.option("header", "true").csv("organizations-100.csv")
df.write.mode('overwrite').format("delta").partitionBy(partition_keys).save("./Documents/DE/Delta/test-db/organisatuons")

There are two modes available in Delta Lake and Spark (Append and Overwrite) while writing the data in the Delta tables from any source.

For now, we have enabled the Delta catalog to store all metadata-related information. We can also use the hive meta store to store the metadata information and to directly run the SQL queries over the delta tables. You can use the cloud storage path as well.

Read data from the Delta tables:

delta_df = spark.read.format("delta").load("./Documents/DE/Delta/test-db/organisatuons")
delta_df.show()

delta_df = spark.read.format("delta").load("./Documents/DE/Delta/test-db/organisatuons")
delta_df.show()

‍

Here, you can see the folder structure, and after writing the data into Delta Tables, it creates one delta log file, which keeps track of metadata, partitions, and files.

Option 2: Create Delta Table and insert data using Spark SQL.

spark.sql("CREATE TABLE orgs_data(index String, c_name String, organization_id String, name String, website String, country String, description String, founded String, industry String, num_of_employees String, remarks String) USING DELTA")

spark.sql("CREATE TABLE orgs_data(index String, c_name String, organization_id String, name String, website String, country String, description String, founded String, industry String, num_of_employees String, remarks String) USING DELTA")

Insert the data:

df.write.mode('append').format("delta").option("mergeSchema", "true").saveAsTable("orgs_data")
spark.sql("SELECT * FROM orgs_data").show()
spark.sql("DESCRIBE TABLE orgs_data").show()

df.write.mode('append').format("delta").option("mergeSchema", "true").saveAsTable("orgs_data")
spark.sql("SELECT * FROM orgs_data").show()
spark.sql("DESCRIBE TABLE orgs_data").show()

This way, we can read the Delta table, and you can use SQL as well if you have enabled the hive.

Schema Enforcement: Safeguarding Your Data

In the realm of data management, maintaining the integrity of your dataset is paramount. Delta Lake, with its schema enforcement capabilities, ensures that your data is not just welcomed with open arms but also closely scrutinized for compatibility. Let’s dive into the meticulous checks

Delta Lake performs when validating incoming data against the existing schema:

Column Presence: Delta Lake checks that every column in your DataFrame matches the columns in the target Delta table. If there’s a single mismatch, it won’t let the data in and, instead, will raise a flag in the form of an exception.

Data Types Harmony: Data types are the secret language of your dataset. Delta Lake insists that the data types in your incoming DataFrame align harmoniously with those in the target Delta table. Any discord in data types will result in a raised exception.

Name Consistency: In the world of data, names matter. Delta Lake meticulously examines that the column names in your incoming DataFrame are an exact match to those in the target Delta table. No aliases allowed. Any discrepancies will lead to, you guessed it, an exception.

This meticulous schema validation guarantees that your incoming data seamlessly integrates with the target Delta table. If any aspect of your data doesn’t meet these strict criteria, it won’t find a home in the Delta Lake, and you’ll be greeted by an error message and a raised exception.

Schema Evolution: Adapting to Changing Data

In the dynamic landscape of data, change is the only constant. Delta Lake’s schema evolution comes to the rescue when you need to adapt your table’s schema to accommodate incoming data. This powerful feature offers two distinct approaches:

Overwrite Schema: You can choose to boldly overwrite the existing schema with the schema of your incoming data. This is an excellent option when your data’s structure undergoes significant changes. Just set the “overwriteSchema” option to true, and voila, your table is reborn with the new schema.

Merge Schema: In some cases, you might want to embrace the new while preserving the old. Delta Lake’s “Merge Schema” property lets you merge the incoming data’s schema with the existing one. This means that if an extra column appears in your data, it elegantly melds into the target table without throwing any schema-related tantrums.

Should you find the need to tweak column names or data types to better align with the incoming data, Delta Lake’s got you covered. The schema evolution capabilities ensure your dataset stays in tune with the ever-changing data landscape. It’s a smooth transition, no hiccups, and no surprises, just data management at its finest.

spark.read.table(...) 
  .withColumn("birthDate", col("birthDate").cast("date")) 
  .write 
  .format("delta") 
  .mode("overwrite")
  .option("overwriteSchema", "true") 
  .saveAsTable(...)

spark.read.table(...) 
  .withColumn("birthDate", col("birthDate").cast("date")) 
  .write 
  .format("delta") 
  .mode("overwrite")
  .option("overwriteSchema", "true") 
  .saveAsTable(...)

The above code will overwrite the existing delta table with the new schema along with the new data.

Delta Lake has support for automatic schema evolution. For instance, if you have added two more columns in the Delta Lake tables and still tried to access the existing table, you will be able to read the data without any error.

There is another way as well. For example, if you have three columns in a Delta table but the incoming table has four columns, you can set up spark.databricks.delta.schema.autoMerge.enabled is true. It can be done for the entire cluster as well.

spark.sql("DESCRIBE TABLE orgs_data").show()

spark.sql("DESCRIBE TABLE orgs_data").show()

Let’s add one more column and try to access the data again:

spark.sql(“alter table orgs_data add column(exta_col String)”)

spark.sql(“describe table orgs_data”).show()

As you can see, that column has been added but has not impacted the data. You can still smoothly and seamlessly read the data. It will set null to a newly created column.

What happens if we receive the extra column in an incoming CSV that we want to append to the existing delta table? You have to set up one config here for that:

input_df = spark.read.format('csv').option('header', 'true').load("../Desktop/Data-Engineering/data-samples/input-data/organizations-11111.csv")
input_df.printSchema()
input_df.write.mode('append').format("delta").option("mergeSchema", "true").saveAsTable("orgs_data")
spark.sql("SELECT * FROM orgs_data").show()
spark.sql("DESCRIBE TABLE orgs_data").show()

input_df = spark.read.format('csv').option('header', 'true').load("../Desktop/Data-Engineering/data-samples/input-data/organizations-11111.csv")
input_df.printSchema()
input_df.write.mode('append').format("delta").option("mergeSchema", "true").saveAsTable("orgs_data")
spark.sql("SELECT * FROM orgs_data").show()
spark.sql("DESCRIBE TABLE orgs_data").show()

You have to add this config mergeSchem=true while appending the data. It will merge the schema of incoming data that is receiving some extra columns.

The first figure shows the schema of incoming data, and in the previous one, we have already seen the schema of our delta tables.

Here, we can see that the new column that was coming in the incoming data is merged with the existing schema of the table. Now, the delta table has the latest updated schema.

Time Travel

Basically, Delta Lake keeps track of all the changes in _delta_log by creating a log file. By using this, we can fetch the data of the previous version by specifying the version number.

df = spark.read.format("delta").option("versionAsOf", 0).load("orgs_data")
df.show()

df = spark.read.format("delta").option("versionAsOf", 0).load("orgs_data")
df.show()

Here, we can see the first version of the data, where we have not added any columns. As we know, the Delta table maintains the delta log file, which contains the information of each commit so that we can fetch the data till the particular commit.

Upsert, Delete, and Merge

Unlocking the Power of Upsert with Delta Lake

In the exhilarating realm of data management, upserting shines as a vital operation, allowing you to seamlessly merge new data with your existing dataset. It’s the magic wand that updates, inserts, or even deletes records based on their status in the incoming data. However, for this enchanting process to work its wonders, you need a key—a primary key, to be precise. This key acts as the linchpin for merging data, much like a conductor orchestrating a symphony.

A Missing Piece: Copy on Write and Merge on Read

Now, before we delve into the mystical world of upserting with Delta Lake, it’s worth noting that Delta Lake dances to its own tune. Unlike some other table formats like Hudi and Iceberg, Delta Lake doesn’t rely on the concepts of Copy on Write and Merge on Read. These techniques are used elsewhere to speed up data operations.

Two Paths to Merge: SQL and Spark API

To harness the power of upserting in Delta Lake, you have two pathways at your disposal: SQL and Spark API. The choice largely depends on your Delta version. In the latest Delta version, 2.2.0, you can seamlessly execute merge operations using Spark API. It’s a breeze. However, if you’re working with an earlier Delta version, say 1.0.0, then Spark SQL is your trusty steed for upserts and merges. Remember, using the right Delta version is crucial, or you might find yourself grappling with the cryptic “Method not found” error, which can turn into a debugging labyrinth.

In the snippet below, we showcase the elegance of upserting using Spark SQL, a technique that ensures your data management journey is smooth and error-free:

-- Insert new data and update existing data based on the specified key
MERGE INTO targetTable AS target
USING sourceTable AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET *
WHEN NOT MATCHED THEN
  INSERT *
WHEN NOT MATCHED BY SOURCE THEN
  DELETE;

-- Insert new data and update existing data based on the specified key
MERGE INTO targetTable AS target
USING sourceTable AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET *
WHEN NOT MATCHED THEN
  INSERT *
WHEN NOT MATCHED BY SOURCE THEN
  DELETE;

today_data_df = spark.read.format('csv').option('header', 'true').load("../Desktop/Data-Engineering/data-samples/input-data/organizations-11111.csv")
today_data_df.show()
spark.sql("select * from orgs_data where organization_id = 'FAB0d41d5b5ddd'").show()
# Reading Existing Delta table
deltaTable = DeltaTable.forPath(spark, "orgs_data")
today_data_df.createOrReplaceTempView("incoming_data")

today_data_df = spark.read.format('csv').option('header', 'true').load("../Desktop/Data-Engineering/data-samples/input-data/organizations-11111.csv")
today_data_df.show()


spark.sql("select * from orgs_data where organization_id = 'FAB0d41d5b5ddd'").show()


# Reading Existing Delta table
deltaTable = DeltaTable.forPath(spark, "orgs_data")

today_data_df.createOrReplaceTempView("incoming_data")

Here, we are loading the incoming data and showing what is inside. The existing data with the same primary key appears in the Delta table so that we can compare after upserting or merging the data.

spark.sql(
"""
MERGE INTO orgs_data
USING incoming_data
ON orgs_data.organization_id = incoming_data.organization_id
WHEN MATCHED THEN
  UPDATE SET
    organization_id = incoming_data.organization_id,
    name = incoming_data.name
"""
)
spark.sql("select * from orgs_data where organization_id = 'FAB0d41d5b5ddd'").show()

spark.sql(
"""
MERGE INTO orgs_data
USING incoming_data
ON orgs_data.organization_id = incoming_data.organization_id
WHEN MATCHED THEN
  UPDATE SET
    organization_id = incoming_data.organization_id,
    name = incoming_data.name
"""
)

spark.sql("select * from orgs_data where organization_id = 'FAB0d41d5b5ddd'").show()

‍

orgs_data.alias("oldData").merge(
   incoming_data.alias("newData"),
   f"oldData.organization_id = newData.organization_id") 
   .whenMatchedUpdateAll() 
   .whenNotMatchedInsertAll() 
   .execute()

orgs_data.alias("oldData").merge(
   incoming_data.alias("newData"),
   f"oldData.organization_id = newData.organization_id") 
   .whenMatchedUpdateAll() 
   .whenNotMatchedInsertAll() 
   .execute()

This is the example of how you can do upsert using Spark APIs. The merge operation creates lots of small files. You can control the number of small files by setting up the following properties in the spark session.

spark.delta.merge.repartitionBeforeWrite true

spark.sql.shuffle.partitions 10

This is how merge operations work. Merge supports one-to-one mapping. What we want to say is that only rows should try to update the one row in the target delta table. If multiple rows try to update the rows in the target Delta table, it will fail. Delta Lake matches the data on the basis of a key in case of an update operation.

Change Data Feed

This is also a useful feature of Delta Lake and tracks and maintains the history of all records in the Delta table after upsert or insert at the row level. You can enable these things at the beginning while setting up the Spark session or using Spark SQL by enabling “change events” for all the data.

Now, you can see the whole journey of each record in the Delta table, from its insertion to deletion. It introduces one more extra column, _change_type, which contains the type of operations that have been performed on that particular row.

To enable this, you can set these configurations:

spark.sql("set spark.databricks.delta.properties.defaults.enableChangeDataFeed = true;")

spark.sql("set spark.databricks.delta.properties.defaults.enableChangeDataFeed = true;")

Or you can set this conf while reading the delta table as well.

## Stream Data Generation
data = [{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True, "Year": 2022},
        {"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False, "Year": 2020},
        {"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None, "Year": 2022},
        {"Category": 'E', "ID": 5, "Value": 33.87, "Truth": True, "Year": 2022}
        ]
df = spark.createDataFrame(data)
df.show()
df.write.mode('overwrite').format("delta").partitionBy("Year").save("silver_table")

## Stream Data Generation

data = [{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True, "Year": 2022},
        {"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False, "Year": 2020},
        {"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None, "Year": 2022},
        {"Category": 'E', "ID": 5, "Value": 33.87, "Truth": True, "Year": 2022}
        ]

df = spark.createDataFrame(data)

df.show()

df.write.mode('overwrite').format("delta").partitionBy("Year").save("silver_table")

deltaTable = DeltaTable.forPath(spark, "silver_table")
                                 
deltaTable.delete(condition = "ID == 1")
delta_df = spark.read.format("delta").option("readChangeFeed", "true").option("startingVersion", 0).load("silver_table")
delta_df.show()

deltaTable = DeltaTable.forPath(spark, "silver_table")
                                 
deltaTable.delete(condition = "ID == 1")

delta_df = spark.read.format("delta").option("readChangeFeed", "true").option("startingVersion", 0).load("silver_table")
delta_df.show()

Now, after deleting something, you will be able to see the changes, like what is deleted and what is updated. If you are doing upserts on the same Delta table after enabling the change data feed, you will be able to see the update as well, and if you insert anything, you will be able to see what is inserted in your Delta table.

If we overwrite the complete Delta table, it will mark all past records as a delete:

If you want to record each data change, you have to enable this before creating the table so that we can see the data changes for each version. If you’ve already created one table, you won’t be able to see the changes for the previous version once you enable the change data feed, but you will be able to see the changes in all versions that came after this configuration.

Data Skipping with Z-ordering

Data skipping is the technique in Delta Lake where, if you have a larger number of records stored in many files, it will read the data from the files that contain required information, but apart from that, other files will get skipped. This makes it faster to read the data from the Delta tables.

Z-ordering is a technique used to colocate the information in the same dataset files. If you know the column that will be more in use in the select statement and has A cardinality, you can use Z-order by that particular column. It will reduce the large number of files from being read. We can give you multiple columns for Z-order by separating them from commas.

Let’s suppose you have two tables, a and b, and there is one column that is most frequently used. You can increase the number of files to be skipped, and you can use the columns files running that query. Normal order works linearly, whereas Z-order works in multi- dimensionally.

OPTIMIZE events
WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)

OPTIMIZE events
WHERE date >= current_timestamp() - INTERVAL 1 day
ZORDER BY (eventType)

DML Operations

Delta Lake has capabilities to run all the DML operations of SQL in the data lake as well as update, delete, and merge operations.

Integrations and Ecosystem Supported in Delta Lake

‍Read Delta Tables

Unlock the Delta Tables: Tools That Bring Data to Life

Reading data from Delta tables is like diving into a treasure trove of information, and there’s more than one way to unlock its secrets. Beyond the standard Spark API, we have a squad of powerful allies ready to assist: SQL query engines like Athena and Trino. But they’re not just passive onlookers; they bring their own magic to the table, empowering you to perform data manipulation language (DML) operations that can reshape your data universe.

Athena: Unleash the SQL Sorcery

Imagine Athena as the Oracle of data. With SQL as its spellbook, it delves deep into your Delta tables, fetching insights with precision and grace. But here’s the twist: Athena isn’t just for querying; like a skilled blacksmith, it can help you hammer your data into a new shape, creating a masterpiece.

Trino: The Shape-Shifting Wizard

Trino, on the other hand, is the shape-shifter of the data realm. It glides through Delta tables, allowing you to perform an array of DML operations that can transform your data into new, dazzling forms. Think of it as a master craftsman who can sculpt your data, creating entirely new narratives and visualizations.

So, when it comes to Delta tables, these tools are not just readers; they are your co-creators. They enable you to not only glimpse the data’s beauty but also mold it into whatever shape serves your purpose. With Athena and Trino at your side, the possibilities are as boundless as your imagination.

Read Delta Tables Using Spark APIS

from delta.tables import *
delta_df =DeltaTable.forPath(spark,"./Documents/DE/Delta/test-db/organisatuons")<br>delta_df.toDf().show()

from delta.tables import *
delta_df =DeltaTable.forPath(spark,"./Documents/DE/Delta/test-db/organisatuons")
delta_df.toDf().show()

Steps to Set Up Delta Lake with S3 on EC2 Or EMR and Access Data through Athena

Data Set Used – We have generated some dummy data of around 100gb and written that into the Delta tables.

Step 1- Set up a Spark session along with AWS cloud storage and Delta – Spark. Here, we have used an EC2 instance with Spark 3.3 and Delta version 2.1.1. Here, we are setting up Spark config for Delta and S3.

AWS_ACCESS_KEY_ID = "XXXXXXXXXXXXXXXXXXXXXX"
AWS_SECRET_ACCESS_KEY = "XXXXXXXXXXXXXXXXXXXXXX+XXXXXXXXXXXXXXXXXXXXXX"

spark_jars_packages = "com.amazonaws:aws-java-sdk:1.12.246,org.apache.hadoop:hadoop-aws:3.2.2,io.delta:delta-core_2.12:2.1.1"

spark = SparkSession.builder.appName('delta') \
   .config("spark.jars.packages", spark_jars_packages) \
   .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
   .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
   .config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider') \
   .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
   .config("spark.hadoop.fs.AbstractFileSystem.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
   .config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.S3SingleDriverLogStore") \
   .config("spark.driver.memory", "20g") \
   .config("spark.memory.offHeap.enabled", "true") \
   .config("spark.memory.offHeap.size", "8g") \
   .getOrCreate()

spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key", AWS_ACCESS_KEY_ID)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET_ACCESS_KEY)

AWS_ACCESS_KEY_ID = "XXXXXXXXXXXXXXXXXXXXXX"
AWS_SECRET_ACCESS_KEY = "XXXXXXXXXXXXXXXXXXXXXX+XXXXXXXXXXXXXXXXXXXXXX"

spark_jars_packages = "com.amazonaws:aws-java-sdk:1.12.246,org.apache.hadoop:hadoop-aws:3.2.2,io.delta:delta-core_2.12:2.1.1"

spark = SparkSession.builder.appName('delta') \
   .config("spark.jars.packages", spark_jars_packages) \
   .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
   .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
   .config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider') \
   .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
   .config("spark.hadoop.fs.AbstractFileSystem.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
   .config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.S3SingleDriverLogStore") \
   .config("spark.driver.memory", "20g") \
   .config("spark.memory.offHeap.enabled", "true") \
   .config("spark.memory.offHeap.size", "8g") \
   .getOrCreate()

spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key", AWS_ACCESS_KEY_ID)
spark.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key", AWS_SECRET_ACCESS_KEY)

Spark Version – You can use any Spark version, but Spark 3.3.1 came along with the pip install. Just make sure whatever version you are using is compatible with the Delta Lake version that you are using; otherwise, most of the features won’t work.

Step 2 – Here, we are creating a Delta table with an S3 path. We can directly write the data into an S3 bucket as a Delta table, but it is better to create a table first and then write it into S3 to make sure the schema is correct.

Set the Delta location path if it exists to run the Spark SQL query and create the Delta table along with the S3 path.

# If table is already there
delta_path = "s3a://abhishek-test-01012023/delta-lake-sample-data/"
spark.conf.set('table.location', delta_path)

# Creating new delta table on s3 location
spark.sql("CREATE TABLE delta.`s3://abhishek-test-01012023/delta-lake-sample-data/`(id INT, first_name String, "
         "last_name String, address String, pincocde INT, net_income INT, source_of_income String, state String, "
         "email_id String, description String, population INT, population_1 String, population_2 String, "
         "population_3 String, population_4 String, population_5 String, population_6 String, population_7 String, "
         "date String) USING DELTA PARTITIONED BY (date)")

# If table is already there
delta_path = "s3a://abhishek-test-01012023/delta-lake-sample-data/"
spark.conf.set('table.location', delta_path)

# Creating new delta table on s3 location
spark.sql("CREATE TABLE delta.`s3://abhishek-test-01012023/delta-lake-sample-data/`(id INT, first_name String, "
         "last_name String, address String, pincocde INT, net_income INT, source_of_income String, state String, "
         "email_id String, description String, population INT, population_1 String, population_2 String, "
         "population_3 String, population_4 String, population_5 String, population_6 String, population_7 String, "
         "date String) USING DELTA PARTITIONED BY (date)")

Step 3 – Here, I have given one link that I have used to generate the dummy data and have written that into the S3 bucket as Delta tables. Feel free to look over this. An example code of writing is given below:

df.write.format("delta").mode("append").partitionBy("date").save("s3a://abhishek-test-01012023/delta-lake-sample-data/")

df.write.format("delta").mode("append").partitionBy("date").save("s3a://abhishek-test-01012023/delta-lake-sample-data/")

https://github.com/velotio-tech/delta-lake-iceberg-poc/blob/0396cdbf96230609695a907fdbe8c240042fce9e/delta-data-writer.py#L83

In the above link, you find the code of dummy data generation.

Step 4 – Here, we are printing the count and selecting some data from the Delta table that we have written in just right away.

Run the SQL query to check the table data and upsert using S3 bucket data:

spark.sql("select count(*) from delta.`s3://abhishek-test-01012023/delta-lake-sample-data/` group by id having count("
         "*) > 1").show()

spark.sql("select count(*) from delta.`s3://abhishek-test-01012023/delta-lake-sample-data/`").show()

#############################################################################
# Upsert
#############################################################################

# upserts the starting five records. We will read first five record and will do some changes in some columns and

input_df = spark.read.csv("s3a://abhishek-test-01012023/incoming_data/delta/4e0ae9f5-8c9d-435a-a434-febff1effbc3.csv",inferSchema=True,header=True)
input_df.printSchema()
input_df.createOrReplaceTempView("incoming")
spark.sql("MERGE INTO delta.`s3://abhishek-test-01012023/delta-lake-sample-data/` t USING (SELECT * FROM incoming) s ON t.id = s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *")

spark.sql("select count(*) from delta.`s3://abhishek-test-01012023/delta-lake-sample-data/` group by id having count("
         "*) > 1").show()

spark.sql("select count(*) from delta.`s3://abhishek-test-01012023/delta-lake-sample-data/`").show()

#############################################################################
# Upsert
#############################################################################

# upserts the starting five records. We will read first five record and will do some changes in some columns and

input_df = spark.read.csv("s3a://abhishek-test-01012023/incoming_data/delta/4e0ae9f5-8c9d-435a-a434-febff1effbc3.csv",inferSchema=True,header=True)
input_df.printSchema()
input_df.createOrReplaceTempView("incoming")
spark.sql("MERGE INTO delta.`s3://abhishek-test-01012023/delta-lake-sample-data/` t USING (SELECT * FROM incoming) s ON t.id = s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *")

This is the output of select statement:

This is the schema of the incoming data we are planning to merge into the existing Delta table:

After upsert, let’s see the data for the particular data partition:

spark.sql(“select * from delta.`s3://abhishek-test-01012023/delta-lake-sample-data/` where id = 1 and date = 20221206”)

‍

Access Delta table using Hive or any other external metastore:

For that, we have to create a link between them and to create this link, go to the Spark code and generate the manifest file on the S3 path where we have already written the data.

spark.sql("GENERATE symlink_format_manifest FOR TABLE 
delta.`s3a://abhishek-test-01012023/delta-lake-sample-data/`")

spark.sql("GENERATE symlink_format_manifest FOR TABLE 
delta.`s3a://abhishek-test-01012023/delta-lake-sample-data/`")

This will create the manifest folder not go to Athena and run this query:

CREATE EXTERNAL TABLE delta_db.delta_table(id INT, first_name String, last_name String, address String, pincocde INT, net_income INT, source_of_income String, state String, email_id String, description String, population INT, population_1 String, population_2 String, population_3 String, population_4 String, population_5 String, population_6 String, population_7 String) 
PARTITIONED BY (date String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://abhishek-test-01012023/delta-lake-sample-data/_symlink_format_manifest/'

CREATE EXTERNAL TABLE delta_db.delta_table(id INT, first_name String, last_name String, address String, pincocde INT, net_income INT, source_of_income String, state String, email_id String, description String, population INT, population_1 String, population_2 String, population_3 String, population_4 String, population_5 String, population_6 String, population_7 String) 
PARTITIONED BY (date String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://abhishek-test-01012023/delta-lake-sample-data/_symlink_format_manifest/'

MSCK REPAIR TABLE delta_db.delta_table

MSCK REPAIR TABLE delta_db.delta_table

You will be able to query the data.

Conclusion: A New Dawn

In a world where data continued to grow in volume and complexity, Delta Lake stood as a beacon of hope. It empowered organizations to manage their data lakes with unprecedented efficiency and extract insights with unwavering confidence.

The adoption of Delta Lake marked a new dawn in the realm of data. Whether dealing with structured or semi-structured data, it was the answer to the prayers of data warriors. As the sun set on traditional formats, Delta Lake emerged as the hero they had been waiting for—a hero who had not only revolutionized data storage and processing but also transformed the way stories were told in the world of data.

And so, the legend of Delta Lake continued to unfold, inspiring data adventurers across the land to embark on their own quests, armed with the power of ACID transactions, time travel, and the promise of a brighter, data-driven future.

The post The Data Lake Revolution: Unleashing the Power of Delta Lake appeared first on R Systems.