
In today’s rapidly evolving technology landscape, maintaining network reliability is paramount. Our team in Microsoft Digital, the company’s IT organization, keeps the company connected and maintains foundational network services for all our employees and guests. With an environment comprising 64,000 network devices, over 700 buildings supporting 350,000 users with over 1 million connected devices generating 170,000 incidents per year, traditional methods of network management can often be insufficient.
To manage at this scale, we’re envisioning and executing a future of AI transformation and are creating new AI solutions that will help us to be more efficient and agile. This is a critical part of our role as the team that’s responsible for powering, protecting, and transforming our digital employee experience across devices, applications, and hybrid infrastructure at Microsoft.
“As the organization that acts as Customer Zero for the company, we are leading with AI to help power Microsoft and keep things like our network infrastructure secure and reliable,” says Brian Fielder, vice president of tenant and platform management for Microsoft Digital. “We must aggressively experiment and adopt AI agents to foster innovation and model the future of service engineering.”
Within Microsoft Digital, our AIOps and Network Infrastructure Copilot (NiC) AI solutions have emerged as transformative new tools for enhancing network performance. These tools take advantage of AI-powered capabilities by turning data and knowledge into powerful insights and, eventually, meaningful actions to run the industry’s most secure and reliable enterprise network.
These tools share a common data architecture comprised of millions of telemetry points but perform unique AI functions in both running and supporting the network. AIOps is an automation solution that uses data insights to prevent and resolve network issues before they become impactful. NiC is an interactive experience that helps network practitioners, and a variety of other personas more easily interact in natural language with complex network services to get rapid answers and intelligent suggestions.
AIOps: Transforming how we deliver operational excellence

AIOps integrates AI into IT operations by automating and enhancing various aspects of network management. This approach significantly reduces manual intervention, allowing network engineers to focus on higher-value tasks. Key components of AIOps include:
- Automated ticketing and remediation: AIOps automates the creation and resolution of tickets, reducing the time and effort required to manage incidents. This automation is particularly beneficial in environments with high ticket volumes, ensuring timely and efficient resolution of network issues.
- Ticket noise reduction: AIOps uses AI-powered ticket correlation, suppression, and enrichment capabilities to significantly reduce ticketing noise, enabling engineers to concentrate on the most critical issues.
- Automatic remediation: AIOps executes automatic troubleshooting and remediation actions on behalf of engineers to mitigate issues and keep outage duration and the associated business impact to a minimum. This is accomplished based on troubleshooting knowledge and successful remediation steps executed.
- Postmortem report generation: AI-powered tools generate detailed postmortem reports for network incidents, providing insights into the root causes and recommended remediation steps. This capability enhances the learning process and helps prevent future occurrences.
AIOps components mapping (based on Gartner’s AIOps model)

Network Infrastructure Copilot (NiC): The everyday AI assistant

NiC is an AI assistant designed to support network engineers in managing complex network environments. NiC provides powerful data insights and documentation, enabling engineers to design, configure, analyze, and troubleshoot network issues using natural language queries. Key features of NiC include:
- Data insights: NiC helps engineers extract valuable data insights from various sources, such as wikis, SharePoint libraries, troubleshooting guides, and the infrastructure data lake (IDL). This capability streamlines data analysis and enhances decision-making to take impactful actions.
- Network artifacts interpretation: NiC understands key artifacts such as device logs and configurations, providing engineers with concise and relevant information. This capability greatly reduces the cognitive load on engineers to access and process the most critical data and insights required to manage a complex network environment.
- Simplified network observability: NiC enables non-networking personas (such as conference room technicians and facilities managers) to get quick glances at the health and configuration of their services without requiring deep understanding of network protocols and taxonomies.
“NiC has dramatically reduced the time engineers spend searching through documentation and other network artifacts to yield actionable insights, slashing effort from 25 minutes to under 5 minutes,” says Phil Suver, a principal group product manager in Microsoft Digital.
Transforming network engineering with a Copilot agent

Driving impact through AIOps and NiC
The integration of AIOps and NiC has significantly improved network reliability in several ways:
- Efficiency gains: The automation of routine tasks and the provision of actionable insights through AIOps and NiC has drastically reduced the cognitive load on network engineers. In the past several months, network practitioners have cumulatively saved over 11,000 hours in network service management. This efficiency gain frees up engineers to focus on strategic initiatives that further enhance network performance.
- Rapid issue detection and resolution: AI-powered anomaly detection and automated remediation ensure that potential issues are identified and resolved before they impact network performance. This proactive approach minimizes downtime and enhances overall network reliability.
Lessons learned in building and deploying AIOps and NiC
To achieve measurable and sustainable business impact, we established an AI-focused feedback loop that emphasizes rapid experimentation, value validation, and a continuous improvement mindset. The following are key lessons learned from our journey in developing and scaling AI solutions that enhance Microsoft’s network reliability.
- Prioritization: One of the significant tradeoffs was the allocation of resources. The team had to decide how to distribute resources effectively to support the development of network AI solutions while maintaining other critical priorities such as the Secure Future Initiative (SFI).
- Change management: Implementing AIOps and NiC required significant changes in how network engineers and other stakeholders worked. The team had to manage organizational changes and enable network engineers to streamline workflows to incorporate AI effectively. This involved providing training, gathering feedback, and continuously improving the tools based on usage and learnings.
- Impact measurement: Measuring the impact of AI on time savings and productivity gains was essential. The team established baselines and targets—in the form of KPI—to track employee productivity gains realized through AI, helping them achieve more.
- Scalability and performance: Ensuring that NiC and AIOps could scale to handle the large number of devices and tickets was a critical consideration. The team had to make decisions about the architecture and design of these solutions to ensure they could support the growing demands of the network.
“By focusing on these key areas, we were able to deploy AI solutions that not only met immediate needs but also positioned us for long-term success and innovation,” says Eddie Lau, a principal product manager in Microsoft Digital.

Here are four key insights into how AIOps and Network Infrastructure Copilot (NiC) have transformed network management and performance:
- The integration of AIOps and Network Infrastructure Copilot (NiC) has automated routine tasks and provided actionable insights, reducing the cognitive load on network engineers and enhancing overall network performance.
- AI-powered anomaly detection and automated remediation ensure potential issues are identified and resolved before impacting network performance, minimizing downtime and enhancing reliability.
- Implementing AIOps and NiC required managing organizational changes, providing training, gathering feedback, and continuously improving the tools based on usage and learnings.
- Ensuring NiC and AIOps could scale to handle the large number of devices and tickets was crucial, requiring careful decisions about architecture and design.

- Learn how we’re transforming our approach to patch management at Microsoft.
- Take a peek inside the councils steering AI projects at Microsoft.
- Read how we’re keeping our network infrastructure healthy at Microsoft with an employee-built AI agent.
- Learn how we’re using an employee-built AI agent to keep our network infrastructure healthy.