Executive Summary / Key Takeaways
- Àkàndé ⧉ is an open-source Python voice assistant that chains OpenAI Whisper speech-to-text, GPT-4 chat completions, a local SQLite response cache, and fpdf2 PDF export into a single voice-driven workflow requiring no cloud storage and no local AI model weights.
- The SQLite cache stores SHA-256 hashes of normalised query strings mapped to raw API response text; cache hits cost zero tokens and return in under 10 ms, making repeated queries (such as reviewing a decision from earlier in a meeting) essentially free.
- Multi-turn conversation is maintained by building the
messageslist in memory and passing it on every Chat Completions API call — the model receives full session history so it can refer to earlier exchanges, at the cost of incrementally increasing token usage per turn.- PDF summary generation serialises the session
messageslist to a formatted fpdf2 document: user turns and assistant turns are labelled, timestamps are inserted, and automatic pagination handles sessions of any length; the file is written to the local filesystem, not uploaded.- Privacy boundary: only the live query (and session history up to the context window limit) leaves the device — no audio recordings, no transcripts, and no cached responses are sent to any remote service other than OpenAI's API.
Àkàndé ⧉ is an open-source Python voice assistant built around three composable components: OpenAI Whisper for speech recognition, the GPT-4 Chat Completions API for language understanding and generation, and a local SQLite database for response caching and session persistence. The result is a voice-driven workflow that can be run on a laptop without local model weights, offline storage infrastructure, or a container stack.
This article describes the technical architecture of each component, the design decisions around caching and multi-turn context, and the PDF export pipeline.
Pipeline Overview #
A single Àkàndé interaction follows this sequence:
- Audio capture — the user speaks; the application records audio to a temporary WAV file using
sounddeviceor a compatible audio library. - Speech-to-text — the WAV file is submitted to
openai.audio.transcriptions.create()(Whisper API); the transcript is returned as a plain string. - Cache lookup — the transcript is normalised (lowercased, whitespace-collapsed) and SHA-256 hashed; the hash is looked up in the local SQLite
response_cachetable. - API call or cache hit — on a miss, the transcript is appended to the session
messageslist and sent toopenai.chat.completions.create(); the response text is stored in the cache. - Text-to-speech — the response text is converted to audio using the
openai.audio.speech.create()endpoint (TTS) or a local TTS library, and played back. - PDF export (on demand) — the full
messageslist is serialised to a formatted fpdf2 document and written to disk.
OpenAI Integration: Chat Completions and Whisper #
Àkàndé uses the openai Python SDK for both speech recognition and text generation. The Whisper transcription call:
with open(audio_file_path, "rb") as f:
transcript = openai.audio.transcriptions.create(
model="whisper-1",
file=f,
language=None # auto-detect
)
user_text = transcript.text
The Chat Completions call maintains a session-scoped messages list:
messages.append({"role": "user", "content": user_text})
response = openai.chat.completions.create(
model="gpt-4-turbo-preview",
messages=messages,
temperature=0.2,
max_tokens=1024
)
assistant_text = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_text})
The system prompt is prepended once at session start and controls Àkàndé's persona, output format, and any domain-specific constraints:
messages = [
{
"role": "system",
"content": (
"You are Àkàndé, a concise executive assistant. "
"Respond in plain prose. Do not use markdown. "
"If asked to summarise, produce three bullet points maximum."
)
}
]
Setting temperature=0.2 trades creative variation for determinism — important for factual queries like recalling a decision from earlier in the session.
SQLite Response Cache #
The cache schema is minimal:
CREATE TABLE IF NOT EXISTS response_cache (
query_hash TEXT PRIMARY KEY,
response TEXT NOT NULL,
created_at INTEGER NOT NULL -- Unix timestamp
);
The lookup and write path:
import hashlib, sqlite3, time
def _normalise(text: str) -> str:
return " ".join(text.lower().split())
def cache_get(conn: sqlite3.Connection, query: str) -> str | None:
h = hashlib.sha256(_normalise(query).encode()).hexdigest()
row = conn.execute(
"SELECT response FROM response_cache WHERE query_hash = ?", (h,)
).fetchone()
return row[0] if row else None
def cache_set(conn: sqlite3.Connection, query: str, response: str) -> None:
h = hashlib.sha256(_normalise(query).encode()).hexdigest()
conn.execute(
"INSERT OR REPLACE INTO response_cache VALUES (?, ?, ?)",
(h, response, int(time.time()))
)
conn.commit()
The INSERT OR REPLACE ensures that a cached response is updated if the same query is submitted after a model upgrade. A TTL-based eviction query (DELETE WHERE created_at < ?) can be scheduled on startup to bound cache size.
Cache hit performance: a SQLite lookup on a local SSD returns in under 1 ms for tables up to ~100,000 rows. The round-trip latency for a live GPT-4 API call is typically 600–900 ms for short responses. For a daily briefing with a handful of repeated queries, the cache eliminates most API calls after the first session.
PDF Summary Generation #
PDF export uses fpdf2, a maintained Python PDF library with no binary dependencies:
from fpdf import FPDF
from datetime import datetime
def export_session_pdf(messages: list[dict], output_path: str) -> None:
pdf = FPDF()
pdf.add_page()
pdf.set_font("Helvetica", size=11)
pdf.set_margins(20, 20, 20)
pdf.set_font("Helvetica", "B", 14)
pdf.cell(0, 10, f"Àkàndé Session — {datetime.now():%Y-%m-%d %H:%M}", ln=True)
pdf.ln(4)
for msg in messages:
if msg["role"] == "system":
continue
label = "You" if msg["role"] == "user" else "Àkàndé"
pdf.set_font("Helvetica", "B", 10)
pdf.cell(0, 6, label, ln=True)
pdf.set_font("Helvetica", size=10)
pdf.multi_cell(0, 5, msg["content"])
pdf.ln(3)
pdf.output(output_path)
multi_cell() handles line-wrapping and automatic page breaks, so sessions of any length produce a well-formatted document without manual pagination logic. The output is a PDF/A-compatible file with no embedded fonts beyond the standard Helvetica metrics.
Privacy Model #
The privacy boundary in Àkàndé is defined by three facts:
- Audio is submitted to the Whisper API over HTTPS and is not retained by OpenAI beyond the API call (per OpenAI's API data usage policy as of February 2024).
- Chat Completions API calls transmit the session
messageslist — which may contain the full conversation history for multi-turn sessions. - The SQLite database and PDF files live entirely on the local filesystem; no background sync to any cloud service occurs.
For executive use cases involving sensitive topics — M&A discussions, personnel matters, regulatory strategy — the session history transmitted to the API should be reviewed against the organisation's AI usage policy before deployment. The max_tokens limit on the system prompt can be used to prevent inadvertent transmission of context that exceeds the intended disclosure scope.
Frequently Asked Questions #
Does Àkàndé retain conversation history after the session ends?
The in-memory messages list is discarded when the process exits. Conversation history is only retained if the user triggers a PDF export or if a custom persistence layer is added. The SQLite cache stores query hashes and response text, not the full conversation context.
How does the cache handle queries that are similar but not identical? The cache uses exact-match hashing on the normalised query string. Two queries that differ by a single word will produce different hashes and result in separate API calls. Semantic caching (using embedding similarity to match near-duplicate queries) would require an additional vector lookup step and is not part of the base implementation.
What GPT model does Àkàndé use by default?
The default is gpt-4-turbo-preview as of February 2024. The model name is a configuration parameter, so any OpenAI chat completion model can be substituted. Switching to gpt-3.5-turbo reduces API cost by approximately 20× per token but reduces reasoning quality for complex multi-step queries.
Can the PDF export format be customised?
Yes. The fpdf2 export function accepts the messages list as its only required input, so font, margins, page size, header content, and labelling can all be changed by editing the export function. fpdf2 also supports adding images, tables, and Unicode fonts, allowing richer document layouts for organisations with specific branding requirements.
References #
- OpenAI. Audio Transcriptions — Whisper API. OpenAI Platform Documentation, 2024. https://platform.openai.com/docs/api-reference/audio/createTranscription
- OpenAI. Chat Completions API. OpenAI Platform Documentation, 2024. https://platform.openai.com/docs/api-reference/chat/create
- Voss, J. et al. fpdf2: Modern PDF generation for Python. GitHub, 2024. https://github.com/py-pdf/fpdf2
- SQLite Consortium. SQLite Documentation. sqlite.org, 2024. https://www.sqlite.org/docs.html
Last reviewed .