01 // The Feedback Loop
02 // The Eye (Vision Stack)
- → Primary: Nvidia Nemotron 12B
A specialized heavily-quantized vision model optimized for edge detection and scene composition. It provides the "raw" data of what is physically present in the frame.
- → Fallback: Llama 3.2 11B
If the primary eye fails, the system fails over to Meta's multimodal model for a second opinion.
- → Constraint
Models are forced to output strictly structured JSON, analyzing "Studium" (cultural context) and "Punctum" (emotional prick).
// The Vision Prompt (Simplified)
{
"role": "system",
"content": "Identify the 'Punctum'
— the specific detail that
pricks the viewer.
Ignore the general subject."
} 03 // The Mind (Cognition Stack)
- → Primary: Gemini 2.0 Flash
Used for high-speed semantic extraction. It reads user input (the "memory") and extracts 3 distinct emotional keywords.
- → Secondary: Mistral Small 24B
A distinct "personality" used for the poetic synthesis, preventing the tone from becoming too corporate or sterile.
// The Extraction Logic
const keywords = content
.split(',')
.map(k => k.trim())
.filter(k => k.length > 0)
.slice(0, 3);
// Result: ["Nostalgia", "Decay", "Warmth"] 04 // The Consensus Engine
The heart of the system is the Consensus Score. We do not assume the AI is correct. Instead, we measure the distance between what the machine sees and what the human feels.
*Simplified representation. Actual calculation uses LLM-based semantic analysis to determine contextual alignment on a scale of 0-100.
- → High Score (>80)
Universal Alignment. The image translates perfectly across biological and silicon substrates.
- → Mid Score (40-79)
Subjective Variance. The AI sees the object, but the Human feels the memory.
- → Low Score (<40)
Human Singularity. The emotion is so specific to the user's history that the machine fails to compute it.