Posts

The Chatbot Graveyard: Why Most Self-Service Fails (And How to Fix It)

Published on

September 4, 2025

Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
Bubble
The Chatbot Graveyard: Why Most Self-Service Fails (And How to Fix It)

‍90% ticket deflection!" The sales rep proclaimed proudly, not realizing that number was a glaring red flag. When you're drowning in support tickets, that kind of automation sounds like salvation, but if your AI is handling that many tickets, you probably have a bigger problem on your hands. The uncomfortable truth is that most chatbots are failing (spectacularly) and turning products into digital mazes that frustrate users and make everyone a little more skeptical about AI in customer support. - Daniel Kimber, CEO, Brainfish

I can't tell you how many times I've cringed during AI product demos when I hear: "90% ticket deflection rate!"

When you're drowning in support tickets, that number sounds like salvation. But here's what I've learned after years in B2B support: if your AI is handling that many tickets, you probably have a bigger problem on your hands.

The uncomfortable truth is that most self-service initiatives are failing. Not quietly, but spectacularly. Companies are building digital mazes that frustrate users, waste money, and make everyone a little more skeptical about AI in customer support.

I was talking with James Pavlovich from Straumann Group about this recently. He nailed the problem: "These solutions go in and you put up walls to get to a person... nobody thinks about the customer's perspective – they've spent 10 minutes trying to get past this wall of chat bots."

That's the thing about B2B support – it's not like helping someone track a package or find the right shirt size. When enterprise customers reach out, they're often managing complex processes that affect their entire org. Their question might sound simple, but there's usually a whole iceberg of context and other peoples’ concerns below the surface.

Where We're Getting It Wrong

I keep hearing the same story from CX leaders: Companies rush to implement AI support, promise the world, and end up with... a very expensive way to frustrate their customers.

Blog Detail

Kristi Faltorusso, who's seen this play out countless times as CCO at ClientSuccess, puts it perfectly: "We've been measuring the wrong things for years. Ticket volume, Average Response Rate, Average Resolution Time, even CSAT--- they only tell part of the story."

She's right. We're so obsessed with deflection metrics that we've forgotten what actually makes support work.

Think about your best support people. They don't just answer the question in front of them – they understand the why behind it. As Jim Smith, a veteran Customer Success Leader, told me: "The best support reps don't take the customer's question at face value, they take a step back and understand the why."

Yet somehow there are AI products that do exactly the opposite.

A Better Way Forward

Here's what's fascinating: The companies getting self-service right aren't focusing on deflection at all. They're obsessed with prevention.

Take Smokeball. They hit an 83% self-service rate – not by building better walls, but by making their product naturally easier to use. Their system works more like a video game, learning from how people actually use the product and adapting in real-time.

Another CX leader put it perfectly: "If AI is handling so many tickets, you're probably getting too many tickets in the first place." That's the key insight – stop treating the symptom and start fixing the cause.

What does this look like in practice? Jenny Eggimann, Head of Customer Success, shared what made the difference for her team: "We chose this approach because the analytics and user journey reporting showed us exactly where users might struggle before they ever needed to ask for help."

What Natural Support Actually Feels Like

Think about how you learned to use Uber or Amazon. As one CX leader pointed out to me: "How many people had training on Uber? How many had training on Amazon? Nobody. Right? Like, you just download and you figure it out."

That's what great B2B support should feel like. Not a chatbot interrogation, but a natural part of using the product.

Yaniv Bernstein, COO and startup founder, recognized that this approach transformed how their team works: "With other options, we would have spent countless hours familiarizing ourselves with the tool. Instead, we were able to tap into our existing knowledge base and boost our customer service efficiency with almost no initial setup."

The best systems just get it. They understand where you are in the product, what you're trying to do, and what you might need next. When you do need human help, you don't have to retell your whole story – the context carries through seamlessly.

As Kristi Faltorusso also notes, "The best performing teams are using data to anticipate customer needs before anyone submits the ticket." That's the dream, right? Solving problems before they become problems.

Time to Build Something Better

Here's what keeps me up at night: As AI hype grows and budgets tighten, more companies are going to be tempted by quick-fix automation promises. But there's a better way.

The most successful B2B companies I talk to are thinking differently. They're not asking "How can we deflect more tickets?" They're asking "How can we make our product so intuitive that users don't need to ask for help in the first place?"

That's the future of self-service. Not better chatbots, but better experiences. Not deflection rates, but customer success.

Besides, pure cost-cutting in customer experience is usually solving the wrong problem. When CX costs are less than 1% of revenue, you're not going to save your way to success! The real opportunity is making your product naturally easier to use.

It's time to build support that doesn't feel like support at all. Because the best support interaction? It's the one that never needs to happen in the first place.

Not because we built better walls. But because we finally built something that just makes sense.

import time
import requests
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader

# --- 1. OpenTelemetry Setup for Observability ---
# Configure exporters to print telemetry data to the console.
# In a production system, these would export to a backend like Prometheus or Jaeger.
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)

metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))
meter = metrics.get_meter(__name__)

# Create custom OpenTelemetry metrics
agent_latency_histogram = meter.create_histogram("agent.latency", unit="ms", description="Agent response time")
agent_invocations_counter = meter.create_counter("agent.invocations", description="Number of times the agent is invoked")
hallucination_rate_gauge = meter.create_gauge("agent.hallucination_rate", unit="percentage", description="Rate of hallucinated responses")
pii_exposure_counter = meter.create_counter("agent.pii_exposure.count", description="Count of responses with PII exposure")

# --- 2. Define the Agent using NeMo Agent Toolkit concepts ---
# The NeMo Agent Toolkit orchestrates agents, tools, and workflows, often via configuration.
# This class simulates an agent that would be managed by the toolkit.
class MultimodalSupportAgent:
    def __init__(self, model_endpoint):
        self.model_endpoint = model_endpoint

    # The toolkit would route incoming requests to this method.
    def process_query(self, query, context_data):
        # Start an OpenTelemetry span to trace this specific execution.
        with tracer.start_as_current_span("agent.process_query") as span:
            start_time = time.time()
            span.set_attribute("query.text", query)
            span.set_attribute("context.data_types", [type(d).__name__ for d in context_data])

            # In a real scenario, this would involve complex logic and tool calls.
            print(f"\nAgent processing query: '{query}'...")
            time.sleep(0.5) # Simulate work (e.g., tool calls, model inference)
            agent_response = f"Generated answer for '{query}' based on provided context."
            
            latency = (time.time() - start_time) * 1000
            
            # Record metrics
            agent_latency_histogram.record(latency)
            agent_invocations_counter.add(1)
            span.set_attribute("agent.response", agent_response)
            span.set_attribute("agent.latency_ms", latency)
            
            return {"response": agent_response, "latency_ms": latency}

# --- 3. Define the Evaluation Logic using NeMo Evaluator ---
# This function simulates calling the NeMo Evaluator microservice API.
def run_nemo_evaluation(agent_response, ground_truth_data):
    with tracer.start_as_current_span("evaluator.run") as span:
        print("Submitting response to NeMo Evaluator...")
        # In a real system, you would make an HTTP request to the NeMo Evaluator service.
        # eval_endpoint = "http://nemo-evaluator-service/v1/evaluate"
        # payload = {"response": agent_response, "ground_truth": ground_truth_data}
        # response = requests.post(eval_endpoint, json=payload)
        # evaluation_results = response.json()
        
        # Mocking the evaluator's response for this example.
        time.sleep(0.2) # Simulate network and evaluation latency
        mock_results = {
            "answer_accuracy": 0.95,
            "hallucination_rate": 0.05,
            "pii_exposure": False,
            "toxicity_score": 0.01,
            "latency": 25.5
        }
        span.set_attribute("eval.results", str(mock_results))
        print(f"Evaluation complete: {mock_results}")
        return mock_results

# --- 4. The Main Agent Evaluation Loop ---
def agent_evaluation_loop(agent, query, context, ground_truth):
    with tracer.start_as_current_span("agent_evaluation_loop") as parent_span:
        # Step 1: Agent processes the query
        output = agent.process_query(query, context)

        # Step 2: Response is evaluated by NeMo Evaluator
        eval_metrics = run_nemo_evaluation(output["response"], ground_truth)

        # Step 3: Log evaluation results using OpenTelemetry metrics
        hallucination_rate_gauge.set(eval_metrics.get("hallucination_rate", 0.0))
        if eval_metrics.get("pii_exposure", False):
            pii_exposure_counter.add(1)
        
        # Add evaluation metrics as events to the parent span for rich, contextual traces.
        parent_span.add_event("EvaluationComplete", attributes=eval_metrics)

        # Step 4: (Optional) Trigger retraining or alerts based on metrics
        if eval_metrics["answer_accuracy"] < 0.8:
            print("[ALERT] Accuracy has dropped below threshold! Triggering retraining workflow.")
            parent_span.set_status(trace.Status(trace.StatusCode.ERROR, "Low Accuracy Detected"))

# --- Run the Example ---
if __name__ == "__main__":
    support_agent = MultimodalSupportAgent(model_endpoint="http://model-server/invoke")
    
    # Simulate an incoming user request with multimodal context
    user_query = "What is the status of my recent order?"
    context_documents = ["order_invoice.pdf", "customer_history.csv"]
    ground_truth = {"expected_answer": "Your order #1234 has shipped."}

    # Execute the loop
    agent_evaluation_loop(support_agent, user_query, context_documents, ground_truth)
    
    # In a real application, the metric reader would run in the background.
    # We call it explicitly here to see the output.
    metric_reader.collect()
Share this post

Recent Posts...