AIOps Being Powered by Robotic Data Automation
The 48th IT Press Tour had the opportunity to meet with the management team at CloudFabrix. This is the team's fourth startup, with the previous three being sold to Cisco. They are doing a lot of interesting things automating operations.
Multi-Cloud Challenges
Multi-cloud challenges continue, as 73% of enterprises use two or more clouds. This is projected to be 81% by 2024. Drivers of this trend include vendor lock-in, the desire to control costs, acquisitions, vertical expertise, growth of edge computing, and IT and OT convergence. Challenges include tool sprawl, observability and AIOps, differing APIs in infrastructure and cloud services, and different data formats.
Digital Initiatives of Modern CXOs
1. Business 360 and SLA management
2. Business infrastructure refresh and rollout
3. Operations productivity and TCO reduction
4. Modern incident management
5. Future-proof business with AI/ML
Dynamic Datacenter
Companies are looking for distributed data integration and automation. The traditional data center had four components: dedicated infrastructure for application, networking, security, and infrastructure. The modern hybrid and dynamic data center is considerably more complex with edge cloud, private cloud, and then one or more hyperscalers. Each has its own Kubernetes and CI/CD pipelines, service-based networking, identity-based security, and infrastructure as code.
Siloed Operational Domains Have Gaps
There is a need for the enterprise to converge around a single source of truth for business and IT, IT and security, and security and networking. Observability is challenged by domain specificity and tool sprawl, alert noise, and data challenges. Security must be maintained in a full stack context with redaction and enrichment and a full copy in S3/replay to SIEM/UEBA/SaSE. Automation is needed for remediation, NLP insights and dashboards, ServiceOps, and ChatOps.
AIOps Operating Model for Platform Engineering Teams
The traditional ITSM operating model with dev teams, apps, cloud engineering, and platform operations proved to be too rigid, disjointed for business and IT, slower time to market and is slow to innovate. The AIOps operating model is self-service, data-centric, AI/ML based, and scalable. The AIOps operating model is aligned for platform engineering and self-service personas.
AIOps Is Broken
Here's why:
- Data integration and ingestion: PS overhead and data gaps
- Enrichment — full stack service mapping: third-party topology, tagging, cookbooks
- Event aggregation/correlation: rule-based correlation, longer time to learn
- Predictive insights: limited predictive ML, closed ML
- ServiceOps virtual war room: no metrics or logs, no remediation, no cross-launch of tools
- Composable analytics — no composability of dashboards, services, pipelines, and bots
How To Fix AIOps
Unify observability, AIOps, and FinOps with robotic data protection fabric across the edge, hybrid, and multi-cloud data centers.
- Data integration and ingestion: low-code bots-based data integration and ingestion
- Enrichment — full stack service mapping: automated service discovery, dependency, and impact map
- Event and log aggregation/correlation: multiple layers of correlation — AI/ML, stack, and time-based
- Predictive insights: built-in MLOps, NLP for continuous ML
- ServiceOps virtual war room: recommendation engineMELT visualization, synthesizer, bi-directional integration
- Composable Analytics: self-service persona-based dashboard, low code bot-based services, search pipeline, and routing
The CloudFabrix data-centric AIOps platform uses a robotic data automation fabric (RDAF) with more than 1,000 bots for low-code/no-code development, composable workflows, data bota, and a distributed data fabric to provide key services, including AIOps, FinOps/asset intelligence, log intelligence and composable analytics.
Use Cases
A large Telco responsible for 30% of the world's internet traffic chose CloudFabrix to improve its infrastructure and applications. They performed cross-domain correlation, alert noise reduction, and full-stack service mapping. They realized 97% noise reduction, 60%+ MTTI, 50%+ MTTR, and consumer insights for cross-sell and up-sell.
A large FinTech is transforming AIOps with RDAF pipelines. They used CloudFabrix to increase business agility and accurate decision-making. Specifically, they did the discovery of applications, IT stacks, ML-based correlation, RCA, and incident resolution. The results are unified event visibility across applications, systems, network assets, and tool stack; 97% event filtering with the correlation and suppression of 550,000 alerts and more than 15,000 incidents; context-aware triage dashboards for exchange (SCOM), digital next-gen (Splunk), and in view (ITRS) application incidents; reduced MTTI with faster impact analysis from 2+ hours to under 30 minutes; reduced MTTR from 4 hours to less than 1 hour.
A large Healthcare provider with more than 250,000 assets used CloudFabrix for asset discovery, application dependency mapping, change management, utilization reports, root cause analysis, and incident resolution. They achieved 100% asset visibility, a 40% reduction in OPEX, and an application impact assessment from 240 hours to 15 minutes.
Six Differentiating Pillars
- RDAF platform — data integration, data preparation, eliminating data silos
- Full-stack service mapping and enrichment
- Composable dashboards, services, search, and pipelines
- AI pipeline synthesis and recommendation engine
- Train and retrain models with continuous ML — MLOps plus data ops
- Bridge the skills gap with low code/no code
Data Management Ops With Bots
Following is a sample of the data management ops that are being done with bots to reduce time and expense while improving speed to value and accuracy.
- Metadata/Data Shaping — metadata description, data filtering, aggregation, and mapping
- Data Security — data masking, data encryption, data decryption, access control, and auditing
- Data Integrity — data signing, data hashing, data checksum
- Data Transformation — data deduplication, implode, explode, transpose, mapping, binning, pivot/unpivot
- Data Delivery -- files, bulk/batch data, streaming data, pub/sub-topics, bookmarking
- Data QualityEnhancement — data dictionaries, data enrichment
- REST API/Integrations — generic REST client, API integrations, named bots
- Data Generation — data simulation, synthetic data for test/dev, what-if analysis
- Data Formatting — data conversion, data formatting, data templating
- Data Governance — data lineage, data tracing, pipeline tracing, data version control
The future of AIOps seems to be more robotic data automation applied throughout the enterprise.