The Problem
Service desk teams handle large volumes of tickets every day. Priority levels, SLA compliance, and resolution times are tracked — but user sentiment is not. How a customer actually felt about their experience was anecdotal at best, entirely invisible at worst.
The consequences were predictable: dissatisfied users were identified late, escalations were handled reactively, and there was no systematic way to correlate service performance with how users perceived it. Management could see that tickets were being closed, but couldn't see whether users were walking away satisfied or frustrated.
What We Built
End-to-End AI Sentiment Pipeline
We designed and built a fully automated sentiment prediction pipeline — from data ingestion through to Power BI visualisation — with a fine-tuned transformer model at its core. The system processes both historical tickets (for trend analysis) and new tickets as they arrive (for live operational monitoring).
ServiceNow Data Ingestion
Ticket data is retrieved directly from ServiceNow via REST API. The system captures a rich set of ticket attributes — not just the description, but the full context needed to understand sentiment accurately:
- Short description and full description
- Comments and work notes
- Priority, escalation flags, and SLA indicators
- Assigned agent and business service
- Opened and closed timestamps, resolution time
- User, company, and system class name
This breadth of data ensures the model works from real ticket context — not isolated text snippets stripped of the information that makes them interpretable.
Sentiment Definition and Labelling
Before training, we established a clear and consistent sentiment definition aligned with operational decision-making:
- Positive sentiment: Tickets reflecting satisfaction — labelled Good or Great
- Negative sentiment: Tickets reflecting dissatisfaction — labelled OK or Bad
Approximately 1,000 historical tickets were manually reviewed and labelled by domain experts to create a reliable, balanced training dataset. This upfront labelling investment is what made the model genuinely useful — rather than a generic sentiment classifier trained on unrelated data.
Model Selection and Training
We evaluated three candidate models before selecting the final approach:
The selected model is a fine-tuned DistilBERT (uncased) transformer — a modern, lightweight transformer architecture optimised for natural language classification tasks. It was trained on the labelled ticket dataset, with care taken to balance positive and negative examples to reduce prediction bias.
Automated Prediction Workflow
The prediction pipeline runs automatically on a schedule:
- Ticket data is retrieved from ServiceNow via REST API
- Relevant fields are preprocessed and prepared for the model
- Text data is passed to the fine-tuned DistilBERT model
- Each ticket is classified as positive or negative sentiment
- Results are stored and made available to the reporting layer
- Power BI datasets are refreshed automatically
The entire process — from new ticket arriving in ServiceNow to sentiment appearing in a Power BI dashboard — requires zero manual intervention.
Power BI Reporting
The dashboard layer translates AI outputs into operational insight. Service managers can track sentiment trends over time, identify which services or ticket categories consistently generate negative sentiment, and correlate SLA compliance with user experience — giving them the evidence base to prioritise improvement initiatives where they'll have the most impact.
Tech Stack
The Results
Service managers now have a live, quantified view of user sentiment across their ticket queue — something that previously didn't exist. Dissatisfied users are identified proactively rather than after an escalation. The SLA-sentiment correlation gives leadership a more complete picture of service quality than SLA compliance alone ever could. And because the pipeline is fully automated, the insight is always current.
Key Takeaways
- Sentiment analysis on service tickets is only as good as the labelling. Investing time in expert-reviewed, domain-specific labels is what separates a genuinely useful model from a generic one.
- Transformer models aren't just more accurate than traditional classifiers for text — they understand context. A ticket that says "finally resolved" reads very differently depending on what came before it. DistilBERT handles that nuance; Random Forest does not.
- The value of AI in operational reporting isn't the model accuracy number — it's the decisions the model enables. Connecting sentiment to SLA data and surfacing it in dashboards that service managers already use is what makes the insight actionable.