Fixing Show Thinking & System Inefficiencies
Fixing 'Show Thinking' & Optimizing Your System: A Deep Dive, Guys!
Hey guys! So, we've got a couple of things to tackle today, specifically the vanishing "Show Thinking" text and some areas where we can seriously boost the efficiency of your system. Let's dive in and fix these issues, making everything smoother and more effective. We'll break it down into easy-to-follow steps, so you can implement these changes and see immediate improvements. Get ready to level up your AI experience! I'll break down the code changes, and then explain why they are needed. It's all about making your system the best it can be, right?
Part 1: Solving the "Show Thinking" Vanishing Act
The Problem: The most annoying thing, the "Show Thinking" text disappears, leaving you in the dark about what the AI is cooking up. This is because of the way the server handles the "Final Save" step. The raw thoughts are streamed to you initially, but then a "cleaned" version is saved, and it strips out those precious insights. No good, right?
The Solution: The core issue lies in server/routes.ts. When the stream completes, the code calculates cleanContentForStorage, which, as mentioned, likely excludes the <thinking> tags (or the model's thoughts). We need a fix! We need to make sure we're explicitly capturing and forcing those "Thought" parts from the Gemini API into the saved message.
Here’s how we're going to fix it.
Step-by-Step Fix in server/routes.ts
-
Locate the File: Open
server/routes.ts. This is where all the action happens. -
Find the Streaming Loop: Within the
POST /api/chats/:id/messagesroute, look for the streaming loop. It usually looks like this:for await (const chunk of result) { // ... code inside the loop ... } -
Insert the Code: Inside this loop, we need to add code to capture the "Thought" text. Here's the updated code:
// ... inside the streaming loop ... for await (const chunk of result) { const text = chunk.text || ""; // NEW: Capture Native "Thinking" parts explicitly // Gemini 2.0 Flash Thinking returns thoughts in a separate part let thoughtText = ""; if (chunk.candidates?.[0]?.content?.parts) { for (const part of chunk.candidates[0].content.parts) { // @ts-ignore - The types might not be updated yet for 'thought' if (part.thought) { thoughtText += part.text; } } } if (text || thoughtText) { // If we have thought text, wrap it so the UI knows how to render/collapse it const contentChunk = thoughtText ? `<thinking>${thoughtText}</thinking>` : text; fullResponse += contentChunk; // CRITICAL: Add it to storage content too! cleanContentForStorage += contentChunk; res.write(`data: ${JSON.stringify({ text: contentChunk })}\n\n`); } // ... rest of loop }- Explanation: This code snippet checks for the
thoughtproperty within the Gemini API's response. If found, it extracts the thought text and wraps it in a<thinking>tag. The crucial part is addingcleanContentForStorage += contentChunk;This ensures the thinking text is saved. We're also making sure that if there is thought text, it gets sent directly to the client with theres.writecommand.
- Explanation: This code snippet checks for the
-
Save the File: Make sure you save your changes to
server/routes.ts. -
(Optional) UI Styling: You might want to update
client/src/components/chat/message.tsxto style that<thinking>tag (e.g., make it gray or collapsible). That way, the UI will be able to display the "Show Thinking" content in an appropriate way.
Part 2: Addressing System Inefficiencies - Wasting Time and Space
Alright, now that we've fixed the disappearing thoughts, let's turn our attention to some major inefficiencies in your system. I took a look at server/routes.ts and rag-service.ts, and I found some areas we can significantly optimize. These tweaks will save you time, space, and computing resources, making everything run much smoother. It's like getting a tune-up for your AI!
1. The "Junk Data" Ingestion - Stop the Noise
The Problem: Right now, your system ingests every single message into the Vector Database (RAG), even the short, meaningless ones like "Hi," "Okay," or "Thanks." This is a waste of space and makes your search results worse. It's like adding a bunch of junk to your memory, making it harder to find the good stuff.
The Cost: This "junk data" pollutes your long-term memory, which makes search results less accurate and increases your storage costs. The more irrelevant data you have, the harder it is for your system to find the information it actually needs.
The Solution: We need to add a "Relevance Filter" to stop the noise. This filter will check the message content before ingesting it. Only meaningful messages will be added to the vector database. We're going to skip messages that are too short to be relevant.
The Fix: Add this code in server/routes.ts before you call ragService.ingestMessage:
// Only ingest if meaningful content exists
if (userMessage.content.length > 15) { // Skip short greetings
ragService.ingestMessage(...).catch(...)
}
- Explanation: This code checks the
content.length. If the message is longer than 15 characters, it is considered meaningful and ingested. You can adjust the length as needed, but this is a good starting point.
2. The "Wait-to-Speak" Latency - Eliminate the Silence
The Problem: Your system generates the entire text response before sending it to the Text-to-Speech (TTS) engine. This means users have to wait a few seconds of dead silence after the text appears before they hear anything. It's like a movie where the sound starts a few minutes after the picture. Annoying, right?
The Cost: Users experience a delay, which makes the AI feel less responsive and natural. It creates a disconnect.
The Solution: Implement a "Sentence Buffer." The goal is to get the AI to start speaking while the rest of the text is being generated. This is a game-changer for user experience.
- Current:
LLM -> Full Text -> TTS -> Audio - Optimized:
LLM -> "Hello," -> TTS -> Audio (User hears "Hello" while LLM writes the rest)
We need to modify the server to stream the text in smaller chunks, allowing the TTS engine to start processing sooner.
3. Redundant "Guest" Tools - Stop Wasting Resources
The Problem: Your code re-evaluates getToolDeclarations(authStatus.isAuthenticated) on every single turn of the agentic loop. That means that the system is doing the same work over and over again. This is wasteful and slow.
The Cost: While it may not be a big deal now, rebuilding this schema every single time adds unnecessary CPU overhead, especially if you have a lot of tools. It's like making the same phone call multiple times when you only need to call once.
The Solution: Calculate the tools once at the beginning and reuse the object.
The Fix:
// Move this OUTSIDE the agentic loop
const toolsConfig = [{ functionDeclarations: getToolDeclarations(isAuthenticated) }];
// Use 'toolsConfig' inside the loop instead of calling getToolDeclarations() again
- Explanation: By calculating
toolsConfigoutside the loop, we avoid repeatedly generating the tool declarations. Inside the loop, you can simply use the pre-calculatedtoolsConfigobject, which is much more efficient.
Audio Settings Dashboard and "Show Thinking" Bug - Bonus!
As a bonus, I also created an Audio Settings Dashboard to give you more control over your audio experience! This lets you manage your voice settings, test SSML, and tweak verbosity levels.
1. New Page: client/src/pages/audio-settings.tsx
This page is your command center for all things audio. It's where you'll be able to control your AI's voice, test out SSML expressions, and adjust the verbosity of your AI's responses.
import { useEffect, useState } from "react";
import { useTTS, VerbosityMode } from "@/contexts/tts-context";
import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card";
import { Button } from "@/components/ui/button";
import { Switch } from "@/components/ui/switch";
import { Label } from "@/components/ui/label";
import { Slider } from "@/components/ui/slider";
import { Textarea } from "@/components/ui/textarea";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Play, Square, Wand2, Volume2, Mic } from "lucide-react";
import { useToast } from "@/hooks/use-toast";
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select";
// Hardcoded list of Google Neural2 voices (matching server/integrations/expressive-tts.ts)
const VOICES = [
{ id: "Kore", gender: "Female", style: "Calm, Professional", code: "en-US-Neural2-C" },
{ id: "Puck", gender: "Male", style: "Deep, Assertive", code: "en-US-Neural2-D" },
{ id: "Charon", gender: "Male", style: "Steady, News", code: "en-US-Neural2-A" },
{ id: "Fenrir", gender: "Male", style: "Energetic", code: "en-US-Neural2-J" },
{ id: "Aoede", gender: "Female", style: "Soft, Storyteller", code: "en-US-Neural2-E" },
{ id: "Leda", gender: "Female", style: "Warm, Conversational", code: "en-US-Neural2-F" },
{ id: "Orus", gender: "Male", style: "Authoritative", code: "en-US-Neural2-I" },
{ id: "Zephyr", gender: "Female", style: "Bright, Assistant", code: "en-US-Neural2-H" },
];
const SSML_EXAMPLES = {
basic: "Hello! I am Meowstik, your AI assistant.",
pause: "Wait... <break time=\"1s\"/> did you hear that?",
emphasis: "I <emphasis level=\"strong\">really</emphasis> need you to focus.",
speed: "I can speak <prosody rate=\"slow\">very slowly</prosody> or <prosody rate=\"fast\">very quickly</prosody>.",
whisper: "<prosody volume=\"soft\">This is a secret.</prosody>"
};
export default function AudioSettings() {
const {
isMuted, toggleMuted,
verbosityMode, setVerbosityMode,
speak, stopSpeaking, isSpeaking,
unlockAudio, isAudioUnlocked
} = useTTS();
const [testText, setTestText] = useState(SSML_EXAMPLES.basic);
const [selectedVoice, setSelectedVoice] = useState("Kore");
const { toast } = useToast();
// Ensure audio context is unlocked on interaction
useEffect(() => {
const handleInteraction = () => {
if (!isAudioUnlocked) unlockAudio();
};
window.addEventListener('click', handleInteraction);
return () => window.removeEventListener('click', handleInteraction);
}, [isAudioUnlocked, unlockAudio]);
const handleSpeak = () => {
if (isMuted) {
toast({ title: "Audio Muted", description: "Unmute global audio to test speech.", variant: "destructive" });
return;
}
// Note: To support specific voice selection for testing,
// we would need to update the speak() context method to accept a voiceId.
// For now, this uses the default server voice.
speak(testText);
};
return (
<div className="container mx-auto py-8 max-w-4xl space-y-8">
<div className="flex flex-col space-y-2">
<h1 className="text-3xl font-bold tracking-tight">Audio & Voice</h1>
<p className="text-muted-foreground">Configure synthesis, expressiveness, and speech behavior.</p>
</div>
{/* Global Controls */}
<Card>
<CardHeader>
<CardTitle>Master Settings</CardTitle>
<CardDescription>Global controls for audio output.</CardDescription>
</CardHeader>
<CardContent className="space-y-6">
<div className="flex items-center justify-between">
<div className="space-y-0.5">
<Label className="text-base">Master Audio</Label>
<p className="text-sm text-muted-foreground">Enable or disable all speech output.</p>
</div>
<Switch checked={!isMuted} onCheckedChange={toggleMuted} />
</div>
<div className="space-y-4">
<div className="flex justify-between">
<Label>Verbosity Level: <span className="font-bold uppercase text-primary">{verbosityMode}</span></Label>
</div>
<div className="pt-2 px-2">
<Slider
defaultValue={[
verbosityMode === "mute" ? 0 :
verbosityMode === "quiet" ? 33 :
verbosityMode === "verbose" ? 66 : 100
]}
max={100}
step={33}
onValueChange={(vals) => {
const v = vals[0];
if (v < 15) setVerbosityMode("mute");
else if (v < 50) setVerbosityMode("quiet");
else if (v < 85) setVerbosityMode("verbose");
else setVerbosityMode("experimental");
}}
/>
<div className="flex justify-between text-xs text-muted-foreground mt-2">
<span>Mute</span>
<span>Quiet (Tools Only)</span>
<span>Verbose (Full)</span>
<span>Experimental</span>
</div>
</div>
</div>
</CardContent>
</Card>
{/* Voice Explorer */}
<div className="grid md:grid-cols-2 gap-6">
<Card className="h-full">
<CardHeader>
<CardTitle>Voice Selection</CardTitle>
<CardDescription>Choose the persona for your AI.</CardDescription>
</CardHeader>
<CardContent>
<div className="grid grid-cols-1 gap-3">
{VOICES.map(voice => (
<div
key={voice.id}
onClick={() => setSelectedVoice(voice.id)}
className={`flex items-center justify-between p-3 rounded-lg border cursor-pointer transition-all ${selectedVoice === voice.id ? 'border-primary bg-primary/5 ring-1 ring-primary' : 'hover:bg-muted'}`}
>
<div className="flex items-center gap-3">
<div className={`p-2 rounded-full ${selectedVoice === voice.id ? 'bg-primary text-primary-foreground' : 'bg-muted'}`}>
{voice.gender === "Male" ? <Volume2 className="h-4 w-4" /> : <Mic className="h-4 w-4" />}
</div>
<div>
<div className="font-medium">{voice.id}</div>
<div className="text-xs text-muted-foreground">{voice.style}</div>
</div>
</div>
{selectedVoice === voice.id && <div className="text-xs font-bold text-primary">Active</div>}
</div>
))}
</div>
</CardContent>
</Card>
{/* SSML Playground */}
<Card className="h-full flex flex-col">
<CardHeader>
<CardTitle>SSML Laboratory</CardTitle>
<CardDescription>Test voice expressiveness with SSML tags.</CardDescription>
</CardHeader>
<CardContent className="flex-1 space-y-4">
<Tabs defaultValue="basic" onValueChange={(val) => setTestText(SSML_EXAMPLES[val as keyof typeof SSML_EXAMPLES])}>
<TabsList className="grid grid-cols-5 w-full h-auto flex-wrap">
<TabsTrigger value="basic">Basic</TabsTrigger>
<TabsTrigger value="pause">Pause</TabsTrigger>
<TabsTrigger value="emphasis">Bold</TabsTrigger>
<TabsTrigger value="speed">Speed</TabsTrigger>
<TabsTrigger value="whisper">Soft</TabsTrigger>
</TabsList>
</Tabs>
<Textarea
value={testText}
onChange={(e) => setTestText(e.target.value)}
className="font-mono text-sm min-h-[150px] resize-none"
placeholder="Type text or SSML here..."
/>
<div className="flex justify-between items-center bg-muted/50 p-3 rounded-md">
<div className="text-xs text-muted-foreground">
Current Voice: <span className="font-mono font-bold text-foreground">{selectedVoice}</span>
</div>
<div className="flex gap-2">
{isSpeaking ? (
<Button variant="destructive" size="sm" onClick={stopSpeaking}>
<Square className="h-4 w-4 mr-2" /> Stop
</Button>
) : (
<Button size="sm" onClick={handleSpeak}>
<Play className="h-4 w-4 mr-2" /> Speak
</Button>
)}
</div>
</div>
</CardContent>
</Card>
</div>
</div>
);
}
2. Update client/src/App.tsx
Add this to the app, so you can access it at /audio
// 1. Add import at the top
import AudioSettingsPage from "@/pages/audio-settings";
// 2. Add route inside the Switch component (near /settings)
<Route path="/audio">
{() => <ProtectedRoute><AudioSettingsPage /></ProtectedRoute>}
</Route>
Next Steps
That's it, guys! Implement these changes, and you'll see a big difference in the user experience and the efficiency of your system. You'll be saving space, reducing latency, and overall, making your AI sing.
Let me know if you have any questions, and happy coding! We're always here to help you get the most out of your AI project. Keep experimenting, keep learning, and keep building awesome stuff!