News

The $90 billion race to supercharge customer service with AI

Published on

September 4, 2025

Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
The $90 billion race to supercharge customer service with AI

Capital Brief: The customer service software industry is both large and uniquely ripe for AI disruption. Australian startups like Brainfish are hoping to do just that.

For the first year of its existence, Sydney startup Brainfish focused on answering a single question: How can AI be used to make online self-service both faster and better?

The solution Brainfish developed will be familiar to anyone who has used ChatGPT or similar AI-powered chatbot solutions. Now in its second year, the startup is moving to the next level. Brainfish’s current objective is to solve customer problems directly, so they don’t need to talk to an agent at all.

“We want to be looking at the actual behaviour of the user, anticipating when they’re going to need help, and suggesting to them what they should be doing,” said co-founder and CEO Daniel Kimber, an alumnus of SiteMinder and Me&U.

Take the example of Smokeball, a SaaS platform for small and medium-sized law firms. At first, Brainfish’s software acted as a chatbot that could answer customer questions. Now, it has grown to something more like a customer success manager, guiding Smokeball’s users with notifications and recommendations on how to better use the platform.

Generative AI has drawn lofty comparisons to the Industrial Revolution, with many breathless descriptions of a technological upheaval that will change every industry it touches. That narrative has been more closely scrutinised in recent months as investors begin to expect returns on the many billions of dollars companies like Alphabet, Microsoft, and Meta are spending on cutting-edge AI capabilities.

While it is impossible to say whether AI is overhyped or to predict where its impact will be felt most dramatically, customer service appears ripe for the disruption promised by large language models. These AI models can read data on a business’s processes and customers, then use that information to be more helpful and sound more human than chatbots of the past.

If the customer service industry is among the first dominoes to fall, it will make a significant impact. The market for customer service software is expected to reach over US$60 billion ($92 billion) by 2031, according to Verified Market Research.

There are other Australian startups in this space too. Optech makes AI chatbots for companies with high costs of error, like healthcare and fintechs. Relevance AI creates worker bots for many industries, including customer service. Meanwhile, Brainfish intends to expand its presence in the US later this year as it doubles down on its efforts to take a slice of that fast-rising market.

“It’s a no brainer, if you want to succeed in tech, it’s still the best market to be in," said Kimber.

Big Tech players

Tech titans in that market have similar ideas, however. In recent months, Qualtrics — which was bought by tech-focused private equity company Silver Lake for US$12.5 billion last May — and software goliath Salesforce have launched their own attempts at reinventing customer service with artificial intelligence.

“We believe customer service is at the starting point of one of the biggest transformation opportunities there is,” said Kishan Chetan, executive vice president at Salesforce's service cloud division.

In July, months after launching its Einstein Copilot for employees, Salesforce unveiled Einstein Service Agent — its initiative to reshape customer service.

With the tool, Salesforce clients can buy access to a chatbot that responds to customer issues in their brand’s voice, even handling image-based problems. The aim is to free human agents from simple tasks, allowing them to focus on upselling and relationship building.

“Increasingly, customer service teams are also tasked with driving growth,” Chetan said. “Frankly, a lot of businesses, as they're growing, are just unable to hire people and retain people enough to manage their growth.”

Salesforce benefits from decades of business and customer data, amassed during its years dominating the customer relationship management (CRM) software market. Qualtrics is riding a similar wave. The company’s strength lies in gathering customer feedback and transforming it into insights. Its latest product suite features AI tools for data analytics, employee management and customer service.

These tools help human agents service customers more effectively, offering prompts on how to respond and suggesting solutions, including future discounts. Like Salesforce, Qualtrics benefits from a long list of existing big business clients. In Australia, these include Woolworths, Virgin Australiaand Bankwest. This advantage will compound, says President of Products Brad Anderson, when the company builds its own large language model.

“We have a data set that has never been used to train any other AI mode, and that is the non-public data from 20,000 organisations,” said Anderson in an interview during a July visit to Sydney. “That’s the path that we’re heading down right now.”

Kimber counters that the advantage of Brainfish and similar startups is their singular focus. Customer service is a nice upsell opportunity for Salesforce, he said, but it will always be secondary to their core product of customer relationship management.

Equally unclear is who the biggest winners and losers will be. Deep-pocketed tech companies generally avoid suggesting their software will replace people. Qualtrics’ Anderson stated that AI won’t replace people as much as people using AI will replace those who don’t. Chetan believes agents using Salesforce’s tech will serve more customers, with hiring unlikely to be impacted due to the high demand for customer support.

"Human support agents should be there, but they should be there supporting from a relationship perspective, from a complexity perspective," Kimber said.

"A lot of the tooling has been built around essentially an old concept of customer service, based around there being a lot of tooling built around human support. The reality is, that is changing."

- Daniel Van Bloom, Tech Correspondent @ Capital Brief

Read full press article here.

import time
import requests
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader

# --- 1. OpenTelemetry Setup for Observability ---
# Configure exporters to print telemetry data to the console.
# In a production system, these would export to a backend like Prometheus or Jaeger.
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)

metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))
meter = metrics.get_meter(__name__)

# Create custom OpenTelemetry metrics
agent_latency_histogram = meter.create_histogram("agent.latency", unit="ms", description="Agent response time")
agent_invocations_counter = meter.create_counter("agent.invocations", description="Number of times the agent is invoked")
hallucination_rate_gauge = meter.create_gauge("agent.hallucination_rate", unit="percentage", description="Rate of hallucinated responses")
pii_exposure_counter = meter.create_counter("agent.pii_exposure.count", description="Count of responses with PII exposure")

# --- 2. Define the Agent using NeMo Agent Toolkit concepts ---
# The NeMo Agent Toolkit orchestrates agents, tools, and workflows, often via configuration.
# This class simulates an agent that would be managed by the toolkit.
class MultimodalSupportAgent:
    def __init__(self, model_endpoint):
        self.model_endpoint = model_endpoint

    # The toolkit would route incoming requests to this method.
    def process_query(self, query, context_data):
        # Start an OpenTelemetry span to trace this specific execution.
        with tracer.start_as_current_span("agent.process_query") as span:
            start_time = time.time()
            span.set_attribute("query.text", query)
            span.set_attribute("context.data_types", [type(d).__name__ for d in context_data])

            # In a real scenario, this would involve complex logic and tool calls.
            print(f"\nAgent processing query: '{query}'...")
            time.sleep(0.5) # Simulate work (e.g., tool calls, model inference)
            agent_response = f"Generated answer for '{query}' based on provided context."
            
            latency = (time.time() - start_time) * 1000
            
            # Record metrics
            agent_latency_histogram.record(latency)
            agent_invocations_counter.add(1)
            span.set_attribute("agent.response", agent_response)
            span.set_attribute("agent.latency_ms", latency)
            
            return {"response": agent_response, "latency_ms": latency}

# --- 3. Define the Evaluation Logic using NeMo Evaluator ---
# This function simulates calling the NeMo Evaluator microservice API.
def run_nemo_evaluation(agent_response, ground_truth_data):
    with tracer.start_as_current_span("evaluator.run") as span:
        print("Submitting response to NeMo Evaluator...")
        # In a real system, you would make an HTTP request to the NeMo Evaluator service.
        # eval_endpoint = "http://nemo-evaluator-service/v1/evaluate"
        # payload = {"response": agent_response, "ground_truth": ground_truth_data}
        # response = requests.post(eval_endpoint, json=payload)
        # evaluation_results = response.json()
        
        # Mocking the evaluator's response for this example.
        time.sleep(0.2) # Simulate network and evaluation latency
        mock_results = {
            "answer_accuracy": 0.95,
            "hallucination_rate": 0.05,
            "pii_exposure": False,
            "toxicity_score": 0.01,
            "latency": 25.5
        }
        span.set_attribute("eval.results", str(mock_results))
        print(f"Evaluation complete: {mock_results}")
        return mock_results

# --- 4. The Main Agent Evaluation Loop ---
def agent_evaluation_loop(agent, query, context, ground_truth):
    with tracer.start_as_current_span("agent_evaluation_loop") as parent_span:
        # Step 1: Agent processes the query
        output = agent.process_query(query, context)

        # Step 2: Response is evaluated by NeMo Evaluator
        eval_metrics = run_nemo_evaluation(output["response"], ground_truth)

        # Step 3: Log evaluation results using OpenTelemetry metrics
        hallucination_rate_gauge.set(eval_metrics.get("hallucination_rate", 0.0))
        if eval_metrics.get("pii_exposure", False):
            pii_exposure_counter.add(1)
        
        # Add evaluation metrics as events to the parent span for rich, contextual traces.
        parent_span.add_event("EvaluationComplete", attributes=eval_metrics)

        # Step 4: (Optional) Trigger retraining or alerts based on metrics
        if eval_metrics["answer_accuracy"] < 0.8:
            print("[ALERT] Accuracy has dropped below threshold! Triggering retraining workflow.")
            parent_span.set_status(trace.Status(trace.StatusCode.ERROR, "Low Accuracy Detected"))

# --- Run the Example ---
if __name__ == "__main__":
    support_agent = MultimodalSupportAgent(model_endpoint="http://model-server/invoke")
    
    # Simulate an incoming user request with multimodal context
    user_query = "What is the status of my recent order?"
    context_documents = ["order_invoice.pdf", "customer_history.csv"]
    ground_truth = {"expected_answer": "Your order #1234 has shipped."}

    # Execute the loop
    agent_evaluation_loop(support_agent, user_query, context_documents, ground_truth)
    
    # In a real application, the metric reader would run in the background.
    # We call it explicitly here to see the output.
    metric_reader.collect()
Share this post

Recent Posts...