The Environmental Impact of Brainfish’s AI

AI's environmental impact is a growing concern, but running inference on pre-trained models (which is what Brainfish primarily does) uses far less energy than training new models from scratch, comparable to standard SaaS applications. By focusing on small, specialized models and leveraging AWS's renewable energy commitments, we're proving that powerful AI tools don't have to come with a massive carbon footprint.

Curious about the environmental impact of Brainfish? It's far less than you think. And in a world where AI is becoming the norm, it’s a point worth diving into.

When we talk about AI's carbon footprint, the conversation is often focused on just one part of the story: model training. This is the process of teaching a large language model using huge datasets, and it requires a staggering amount of energy. To give you some perspective, a single major AI model training run can have a carbon footprint equivalent to hundreds of thousands of miles driven by a gasoline-powered car.

However, once a model is trained, running it (inference) is a different story.

The majority of our energy usage comes from running inference on pre-trained models.

This is far less energy-intensive, similar to the energy consumption of any standard SaaS application. We don't train new foundational models from scratch for each customer; we use models that have already been trained, making Brainfish incredibly efficient from a power perspective.

As researchers have noted, the energy required for inference is dramatically more efficient than for training, making it akin to the power usage of a simple web search. We're a customer-centric company, and that extends to being an environmentally-conscious partner. We do have “fish” in our name, after all.

Our Responsible AI Philosophy

We believe that truly innovative technology should not come at the cost of sustainability. From day one, our team has been focused on building a platform that is not only effective but also thoughtful in its design and footprint.

This is why we focus on leveraging small, pre-trained models. While larger, general-purpose models require massive, continuous training, our approach is more surgical. We've built Brainfish to solve a very specific problem: transforming your internal knowledge into a powerful, automated resource, and making it easily accessible to your team and customers. This narrow focus allows us to operate with a fraction of the energy and computational resources of a large AI company, a direct outcome of our commitment to responsible development.

Our commitment extends to our infrastructure as well. We run on Amazon Web Services (AWS) in the US East region. AWS has made its own significant commitment to sustainability, including a target to match 100% of the electricity used across their global operations with renewable energy and a goal to be "water positive" by 2030, returning more water to communities than their data centers consume.

It’s Only the Beginning

The conversation about AI and the environment is just beginning, and it's one that every company leveraging AI should be having. We believe as an industry, we have a responsibility to not only build powerful technology, but to do so consciously.

We are proud to be a part of the solution, showing that it's possible to deliver immense value without a massive environmental cost. Our model for responsible development proves that an intentional, philosophical approach can lead to a more sustainable, more efficient future for the entire tech landscape. We welcome this conversation with our customers and look forward to continuing to build technology that is both brilliant and responsible.

import time
import requests
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader

# --- 1. OpenTelemetry Setup for Observability ---
# Configure exporters to print telemetry data to the console.
# In a production system, these would export to a backend like Prometheus or Jaeger.
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)

metric_reader = PeriodicExportingMetricReader(ConsoleMetricExporter())
metrics.set_meter_provider(MeterProvider(metric_readers=[metric_reader]))
meter = metrics.get_meter(__name__)

# Create custom OpenTelemetry metrics
agent_latency_histogram = meter.create_histogram("agent.latency", unit="ms", description="Agent response time")
agent_invocations_counter = meter.create_counter("agent.invocations", description="Number of times the agent is invoked")
hallucination_rate_gauge = meter.create_gauge("agent.hallucination_rate", unit="percentage", description="Rate of hallucinated responses")
pii_exposure_counter = meter.create_counter("agent.pii_exposure.count", description="Count of responses with PII exposure")

# --- 2. Define the Agent using NeMo Agent Toolkit concepts ---
# The NeMo Agent Toolkit orchestrates agents, tools, and workflows, often via configuration.
# This class simulates an agent that would be managed by the toolkit.
class MultimodalSupportAgent:
    def __init__(self, model_endpoint):
        self.model_endpoint = model_endpoint

    # The toolkit would route incoming requests to this method.
    def process_query(self, query, context_data):
        # Start an OpenTelemetry span to trace this specific execution.
        with tracer.start_as_current_span("agent.process_query") as span:
            start_time = time.time()
            span.set_attribute("query.text", query)
            span.set_attribute("context.data_types", [type(d).__name__ for d in context_data])

            # In a real scenario, this would involve complex logic and tool calls.
            print(f"\nAgent processing query: '{query}'...")
            time.sleep(0.5) # Simulate work (e.g., tool calls, model inference)
            agent_response = f"Generated answer for '{query}' based on provided context."
            
            latency = (time.time() - start_time) * 1000
            
            # Record metrics
            agent_latency_histogram.record(latency)
            agent_invocations_counter.add(1)
            span.set_attribute("agent.response", agent_response)
            span.set_attribute("agent.latency_ms", latency)
            
            return {"response": agent_response, "latency_ms": latency}

# --- 3. Define the Evaluation Logic using NeMo Evaluator ---
# This function simulates calling the NeMo Evaluator microservice API.
def run_nemo_evaluation(agent_response, ground_truth_data):
    with tracer.start_as_current_span("evaluator.run") as span:
        print("Submitting response to NeMo Evaluator...")
        # In a real system, you would make an HTTP request to the NeMo Evaluator service.
        # eval_endpoint = "http://nemo-evaluator-service/v1/evaluate"
        # payload = {"response": agent_response, "ground_truth": ground_truth_data}
        # response = requests.post(eval_endpoint, json=payload)
        # evaluation_results = response.json()
        
        # Mocking the evaluator's response for this example.
        time.sleep(0.2) # Simulate network and evaluation latency
        mock_results = {
            "answer_accuracy": 0.95,
            "hallucination_rate": 0.05,
            "pii_exposure": False,
            "toxicity_score": 0.01,
            "latency": 25.5
        }
        span.set_attribute("eval.results", str(mock_results))
        print(f"Evaluation complete: {mock_results}")
        return mock_results

# --- 4. The Main Agent Evaluation Loop ---
def agent_evaluation_loop(agent, query, context, ground_truth):
    with tracer.start_as_current_span("agent_evaluation_loop") as parent_span:
        # Step 1: Agent processes the query
        output = agent.process_query(query, context)

        # Step 2: Response is evaluated by NeMo Evaluator
        eval_metrics = run_nemo_evaluation(output["response"], ground_truth)

        # Step 3: Log evaluation results using OpenTelemetry metrics
        hallucination_rate_gauge.set(eval_metrics.get("hallucination_rate", 0.0))
        if eval_metrics.get("pii_exposure", False):
            pii_exposure_counter.add(1)
        
        # Add evaluation metrics as events to the parent span for rich, contextual traces.
        parent_span.add_event("EvaluationComplete", attributes=eval_metrics)

        # Step 4: (Optional) Trigger retraining or alerts based on metrics
        if eval_metrics["answer_accuracy"] < 0.8:
            print("[ALERT] Accuracy has dropped below threshold! Triggering retraining workflow.")
            parent_span.set_status(trace.Status(trace.StatusCode.ERROR, "Low Accuracy Detected"))

# --- Run the Example ---
if __name__ == "__main__":
    support_agent = MultimodalSupportAgent(model_endpoint="http://model-server/invoke")
    
    # Simulate an incoming user request with multimodal context
    user_query = "What is the status of my recent order?"
    context_documents = ["order_invoice.pdf", "customer_history.csv"]
    ground_truth = {"expected_answer": "Your order #1234 has shipped."}

    # Execute the loop
    agent_evaluation_loop(support_agent, user_query, context_documents, ground_truth)
    
    # In a real application, the metric reader would run in the background.
    # We call it explicitly here to see the output.
    metric_reader.collect()

Share this post

The Environmental Impact of Brainfish’s AI

Our Responsible AI Philosophy

It’s Only the Beginning

Recent Posts...