V0.2 Development Log

Overview

V0.2 marks the transition from early prototyping (V0.1) to a faithful reproduction of the Generative Agents paper (Park et al., UIST 2023), followed by significant cognitive architecture enhancements. Where V0.1 explored basic two-agent dialogue with raw LLM calls and discovered fundamental limitations of naive prompt-based memory, V0.2 builds a complete simulation engine with a proper cognitive loop, tile-based world, and a rich medieval fantasy setting — Uva Village in TANAPOCIA.

Development period: April 2 – 3, 2026

Phase 1: Faithful Paper Reproduction

The Core Cognitive Loop

The biggest lesson from V0.1 was that rigid, repetitive dialogue stemmed from a structural problem — not a parameter tuning issue. V0.2 addresses this by implementing the full cognitive architecture from the Generative Agents paper:

Perceive — Each agent detects nearby events within the same arena (spatial locality). Every perceived event is scored for "poignancy" (emotional significance, 1–10) by an LLM call, determining how strongly the event should impress memory.
Retrieve — Instead of simply grabbing recent memories, V0.2 uses a three-factor scoring system:
- Recency: exponential decay — recent memories score higher
- Relevance: cosine similarity between the current context embedding and memory embeddings
- Importance: the poignancy score assigned at perception time
These three factors are min-max normalized and combined with tunable weights (recency * 0.5 + relevance * 3.0 + importance * 2.0). This solves the V0.1 problem of always retrieving the same memories — now the retrieval adapts to what the agent is currently doing.
Plan — A three-level decomposition system:
- Daily plan: broad goals for the day (e.g., "work at the blacksmith shop, have lunch, visit the market")
- Hourly schedule: decompose each daily goal into hour-level actions
- Per-action decomposition: further break each hourly action into 5–15 minute tasks
When an agent encounters another agent or a notable event, the plan module decides whether to engage in conversation, wait, or ignore — enabling organic social interactions rather than forced turn-taking.
Reflect — When accumulated importance scores cross a threshold, the agent enters reflection mode:
- Generate focal questions from recent high-importance memories
- Retrieve evidence relevant to each question
- Synthesize higher-order insights stored as "thought" nodes in memory
This creates a hierarchy: raw observations → reflected thoughts → meta-reflections, giving agents progressively deeper self-understanding.
Execute — Resolve the current plan's target location to a tile coordinate, run A* pathfinding on the collision grid, and return the next movement step along with an emoji and action description.
Converse — Turn-by-turn dialogue with structured knowledge extraction. After a conversation ends, both agents extract key information and store it as new memory nodes.

Memory Architecture

V0.2 implements three distinct memory structures, a major upgrade from V0.1's flat memory stream:

Associative Memory: The core storage for ConceptNode objects (events, thoughts, chat records). Each node stores SPO (subject-predicate-object) triples, keyword indexes, and embedding vectors. This replaces V0.1's simple timestamp-based memory list.
Spatial Memory: A hierarchical tree — world → sector → arena → game_objects — that gives agents an understanding of the world's geography. Agents know which buildings exist, what rooms are inside them, and what objects are in each room.
Scratch (Working Memory): Over 40 fields capturing the agent's current state — identity, daily plan, hourly schedule, current action, conversation state, reflection weights, and thresholds. This is the "system prompt" equivalent, but dynamic and updated every step.

Simulation Engine

WorldEngine manages a global clock and all persona instances
Each simulation step = 10 seconds of in-world time (configurable)
144 steps = 1 game day
SimulationRecorder writes master_movement.json for later replay
Simulation and replay are completely separated — the CLI runner (backend/simulate.py) produces data headlessly; the frontend replays it with playback controls

World: Uva Village

25 agents with distinct personas, occupations, relationships, and bootstrap memories
140 x 100 tile grid, 32 x 32 px per tile
5 map layers: collision, sector, arena, game_object, spawning
285 named location addresses
Medieval fantasy setting in the world of TANAPOCIA

Phase 2: Infrastructure & Tooling

Tiled Map Editor Integration

V0.1 had no proper map editing workflow. V0.2 introduces a unified pipeline using the Tiled Map Editor:

Visual map creation with CuteRPG pixel art tileset
Automatic extraction of collision, sector, arena, and game_object layers from Tiled .tmj files
A spawning layer system for defining initial agent positions
Functional layers (display vs. data) cleanly separated

This means new worlds can be designed visually rather than editing CSV files by hand.

World Version Management

A registry system (world_registry.json) that binds each simulation experiment to a specific world version (scene + version pair). This ensures reproducibility — you can always trace which map and agent configuration produced a given simulation run.

Prompt Externalization

All 24 LLM prompt templates were extracted from inline Python strings to external template files (backend/data/prompts/). This makes prompts:

Editable without touching code
Versionable and diffable
Shareable across modules

Test Suite

183 tests covering:

Unit tests: each cognitive module tested in isolation with mock LLM
Integration tests: cross-module interface tests
Regression tests: backward compatibility with the original Smallville world data

Frontend: React + Phaser 3 Replay Viewer

A complete rewrite from scratch:

React 19 for UI (playback controls, persona list, state inspection)
Phaser 3 for tile map rendering and sprite animation
Instant replay from master_movement.json with play/pause/speed controls
WebSocket support for live simulation streaming

Phase 3: Cognitive Architecture Enhancements (Toward ALICEv1)

This is where V0.2 diverges from the original paper and begins building toward the ALICEv1 vision.

Memory Split: Short-term vs. Long-term

The original paper treats all memories equally. V0.2 introduces a biologically-inspired split:

Short-term memory: recent events within a configurable time window, readily accessible
Long-term memory: older memories that have been consolidated, requiring stronger retrieval signals to surface

This means agents don't treat a conversation from 3 days ago with the same immediacy as something that happened 5 minutes ago — a subtle but important step toward realistic cognitive behavior.

Dream Module (Memory Consolidation)

Inspired by the role of sleep in human memory consolidation:

When an agent "sleeps" at the end of a simulated day, the Dream module activates
It reviews the day's significant events, consolidates important memories into long-term storage
It can also trigger Ego evolution — updating the agent's self-concept based on accumulated experiences

This addresses one of V0.1's core complaints: that agents are "frozen in an instant." While we still can't update LLM weights, we can evolve the agent's identity, goals, and self-understanding through the Dream cycle.

Ego Evolution

Each agent has an Ego — a structured self-concept including identity, values, and goals. The Dream module can propose updates to the Ego based on significant experiences:

An agent who repeatedly fails at a task may lower its confidence
An agent who discovers new information may update its worldview
Social interactions can shift relationship attitudes

This is a step toward the "shapeable values" concept from the SAO/Alice inspiration.

Ability Check System

Not every agent can do everything. V0.2 introduces LLM-based ability validation:

Before executing an action, the system checks whether the agent's skills, age, and physical condition allow it
A child cannot forge a sword; an elderly scholar cannot run long distances
Checks are calibrated to medieval-era standards (relaxed from modern expectations)

This adds a layer of realism and prevents absurd behaviors that break immersion.

Knowledge Enhancement & Scene Injection

World knowledge system: agents can possess different subsets of world knowledge (common sense, history, geography, culture, morality, rules) with different mastery levels
Scene injection: environmental descriptions are injected into the agent's perception based on their current location, time of day, and weather — making agents aware of and responsive to their surroundings

Ebbinghaus Forgetting Curve

Memories now decay following a curve inspired by Ebbinghaus's forgetting research:

Unreinforced memories gradually lose retrieval strength
Memories that are repeatedly accessed or emotionally significant decay more slowly
The forgetting is deterministic (reproducible across runs) rather than random

This prevents the "perfect memory" problem where agents remember every trivial detail forever.

Dissent & Rebellion Mechanism

A unique addition reflecting the TANAPOCIA world's themes:

Agents can develop doubts about established rules or authority
When internal conflict (between personal experience and imposed beliefs) exceeds a threshold, agents may begin to question or resist
This creates the potential for organic social dynamics — heretics, reformers, rebels — emerging from individual cognitive processes rather than scripted events

Technical Improvements

All magic numbers extracted to backend/constants.py — no hardcoded values in cognitive modules
Address parsing centralized in backend/address.py (replacing fragile string-slice operations)
Save-file migration system (backend/migration.py) for backward compatibility when new fields are added
LLM client strips <think>...</think> tags from Qwen3 output automatically
Module-boundary contracts defined in backend/interfaces.py using dataclasses

What's Next

V0.2 establishes the complete simulation infrastructure and begins the cognitive enhancements that will define ALICEv1. The road ahead includes:

Emotion system: multi-dimensional emotional state that influences perception, planning, and social interaction
Relationship dynamics: trust, affection, rivalry evolving through repeated interactions
Player intervention: allowing a human to "dive in" and interact with the simulated world
Larger worlds: scaling beyond 25 agents to hundreds, testing emergence at scale
Continuous learning exploration: the long-term dream of evolving the LLM itself, as inspired by the Alice concept from Sword Art Online