Post Stastics
- This post has 2371 words.
- Estimated read time is 11.29 minute(s).
Purpose
This design describes a PocketFlow Creator workflow that accepts a software project folder containing a docs/ subfolder and a project specification, then uses a queue-driven RALF loop to implement the project in ordered, testable increments.
The central idea is simple:
The LLM may propose.
The queue schedules.
The gates decide.
Git preserves.
The human approves high-risk changes.
This prevents the system from becoming a vague autonomous agent that edits code until it feels done. Instead, it becomes a controlled development workflow: read the specification, extract requirements, create a work queue, implement one bounded task, run gates, commit, update traceability, and repeat.
Expected Project Layout
The flow expects a project folder shaped approximately like this:
my_project/ ├── docs/ │ ├── architecture.md │ ├── requirements.md │ ├── design_notes.md │ └── ... ├── project_spec.md ├── pyproject.toml / package.json / CMakeLists.txt / etc. ├── src/ ├── tests/ └── README.md
The project_spec.md file is the top-level source of truth. Files under docs/ provide supporting architecture, requirements, design notes, and constraints.
What the Flow Does
At a high level, the flow:
- Validates the project folder.
- Reads
project_spec.mdand thedocs/folder. - Builds a compact project context.
- Extracts requirements, constraints, assumptions, and open questions.
- Converts the requirements into ordered implementation tasks.
- Places those tasks into a main FIFO work queue.
- Uses a repair stack for immediate gate failures.
- Uses a deferred stack for blocked or risky tasks.
- Runs one bounded RALF loop per task.
- Runs objective gates such as tests, linting, type checks, and builds.
- Reviews the diff for scope and safety.
- Commits passing work.
- Updates traceability records.
- Repeats until the queue is empty or human intervention is required.
Full Flow Diagram
See assets/diagrams/software_dev_ralf_flow.mmd.
flowchart TD
A[Project Input] --> B[Validate Project Layout]
B -->|valid| C[Scan Project Files]
B -->|missing_spec_or_docs| Z[Human Repair Request]
C --> D[Load Spec and Docs]
D --> E[Build Project Context]
E --> F[Analyze Spec]
F --> G[Create Ordered Work Queue]
G --> H{Queue Empty?}
H -->|yes| Y[Generate Final Report]
H -->|no| I[Pop Next Work Item]
I --> J[Task Feasibility Check]
J -->|ready| K[Create RALF Prompt]
J -->|blocked| X[Push to Deferred Stack]
K --> L[RALF Implement Iteration]
L --> M[Run Quality Gates]
M -->|pass| N[Diff Review]
M -->|fail| O{Retry Limit?}
O -->|no| P[Analyze Failure]
P --> Q[Push Repair Task]
Q --> L
O -->|yes| X
N -->|safe| R[Commit Changes]
N -->|risky| S[Human Approval]
S -->|approved| R
S -->|rejected| T[Revert or Patch]
R --> U[Update Traceability]
U --> H
X --> V{Main Queue Empty?}
V -->|no| H
V -->|yes| W{Deferred Stack Empty?}
W -->|no| I2[Pop Deferred Task]
I2 --> J
W -->|yes| YShared Store
The shared store is the backbone of the flow. The model should not be trusted to remember the project. Durable state belongs in the shared store and on disk.
See assets/schemas/shared_store_schema.yaml.
Major shared-store sections:
project:
root: ""
docs_dir: ""
spec_path: ""
context:
spec_text: ""
docs: {}
repo_map: {}
dependency_summary: ""
planning:
requirements: []
assumptions: []
constraints: []
work_queue: []
repair_stack: []
deferred_stack: []
completed_tasks: []
blocked_tasks: []
current_task:
id: null
title: ""
source_requirement_ids: []
description: ""
acceptance_tests: []
risk_level: "low"
ralf:
implementation_prompt: ""
last_agent_output: ""
last_gate_output: ""
retry_count: 0
gates:
commands: []
results: []
passed: false
traceability:
requirement_to_task: {}
task_to_commits: {}
task_to_tests: {}
verification_log: []
Queue and Stack Model
The flow should use three separate structures.
Main Work Queue
The main work queue is FIFO:
pop from front append new normal tasks to back
This preserves the planner’s intended development order.
Repair Stack
The repair stack is LIFO:
push newest gate failure on top pop newest repair first
This keeps the loop focused on the most recent failure while the relevant diff is still small.
Deferred Stack
The deferred stack holds blocked, unclear, or risky tasks. These should be revisited only after the main queue drains or after a dependency is satisfied.
Node Design
1. ProjectInput
Suggested existing node type: input/config node
Purpose: Accept project root, spec filename, and docs folder name.
Inputs:
project_root: "/path/to/project" spec_filename: "project_spec.md" docs_folder: "docs"
Outputs:
project.root project.docs_dir project.spec_path
Action:
default -> ValidateProjectLayout
2. ValidateProjectLayout
Suggested existing node type: Python/script node
Purpose: Confirm required paths exist and determine whether the project is a git repository.
Checks:
project root exists docs/ exists project_spec.md exists git repo exists or can be initialized
Actions:
valid -> ScanProjectFiles missing_spec_or_docs -> HumanRepairRequest not_git_repo -> GitInitPrompt
Git should not be silently initialized without approval.
3. ScanProjectFiles
Suggested existing node type: directory scanner or Python node
Purpose: Build a safe project map and detect the language stack.
Ignore:
.git/ .venv/ venv/ node_modules/ dist/ build/ __pycache__/ .cache/ .env secrets/
Detect stack hints:
pyproject.toml -> Python package.json -> Node/TypeScript Cargo.toml -> Rust build.zig -> Zig CMakeLists.txt -> C/C++ pom.xml -> Java/Maven
4. LoadSpecAndDocs
Suggested existing node type: file reader / Markdown node
Purpose: Load project_spec.md and supporting docs.
Outputs:
context.spec_text context.docs
Docs are treated as project truth. Contradictions between docs and spec should be flagged later.
5. BuildProjectContext
Suggested existing node type: LLM prompt node
Purpose: Create a compact repository summary.
The summary should include:
architecture important files test system build commands known constraints likely package manager
The node should not invent missing facts.
6. AnalyzeSpec
Suggested existing node type: LLM structured-output node
Purpose: Extract requirements, constraints, assumptions, and open questions.
Output schema:
requirements:
- id: REQ-001
title: ""
description: ""
priority: must|should|could
acceptance:
- ""
likely_files:
- ""
constraints:
- id: CON-001
text: ""
assumptions:
- id: ASM-001
text: ""
open_questions:
- id: Q-001
text: ""
Actions:
ok -> CreateWorkQueue needs_human_clarification -> HumanSpecReview
Implementation should not begin until open questions are resolved or explicitly converted into assumptions.
7. CreateWorkQueue
Suggested existing node type: LLM structured-output node plus Python sorting node
Purpose: Convert requirements into ordered, independently testable tasks.
Example task:
- id: TASK-002
title: "Implement config loader"
depends_on: ["TASK-001"]
source_requirement_ids: ["REQ-002"]
risk_level: medium
files_allowed:
- "src/**"
- "tests/**"
- "docs/**"
acceptance_tests:
- "config loader handles valid YAML"
- "config loader rejects malformed YAML"
Recommended ordering:
- Project skeleton.
- Tests and fixtures.
- Core interfaces.
- Smallest vertical slice.
- Feature increments.
- Error handling.
- Documentation.
- Packaging.
- Final verification.
RALF Subflow
See assets/diagrams/ralf_subflow.mmd.
flowchart TD
A[Create RALF Prompt] --> B[Implementation Agent]
B --> C[Write or Modify Files]
C --> D[Run Gates]
D -->|pass| E[Review Diff]
D -->|fail| F[Summarize Failure]
F --> G{Retry Count < Max?}
G -->|yes| H[Create Repair Prompt]
H --> B
G -->|no| I[Defer or Human Escalation]
E -->|safe| J[Commit]
E -->|risky| K[Human Approval]A proper RALF loop does not mean “let the model edit forever.” It means bounded work, repeated attempts, objective feedback, traceable state, and hard stop rules.
RALF Nodes
PopNextTask
Suggested node type: Python/shared-store node
Purpose: Choose the next task.
Priority:
- Repair stack.
- Main work queue.
- Deferred stack.
Pseudo-logic:
if repair_stack:
current_task = repair_stack.pop()
return "repair"
elif work_queue:
current_task = work_queue.pop(0)
return "task"
elif deferred_stack:
current_task = deferred_stack.pop()
return "deferred"
else:
return "empty"
TaskFeasibilityCheck
Suggested node type: Python plus optional LLM review node
Purpose: Decide whether a task is ready.
Checks:
dependencies completed allowed files are known acceptance criteria exist risk level acceptable no unresolved spec conflict no dependency addition required without approval
Actions:
ready -> CreateRalfPrompt blocked -> PushDeferred needs_approval -> HumanApproval
CreateRalfPrompt
Suggested node type: Markdown/template node
Purpose: Create the bounded implementation prompt.
The full template is included in assets/prompts/ralf_implementer.md.
ImplementationAgent
Suggested node type: LLM agent / command-capable node
Purpose: Modify files for the current task.
Allowed operations:
read project files write allowed files run safe local commands inspect test output
Forbidden without approval:
delete project files broadly access secrets modify .git internals push to remote install dependencies change license run network commands
The implementation agent does not decide success. It only performs an attempt.
RunGates
Suggested node type: command/test runner node
Purpose: Run objective checks.
Python default:
ruff check src tests mypy src pytest
Node/TypeScript default:
npm test npm run lint npm run build
C/C++ default:
cmake --build build ctest --test-dir build --output-on-failure
Zig default:
zig build test
No passing gates, no commit.
AnalyzeGateFailure
Suggested node type: LLM structured-output node
Purpose: Convert raw gate output into a repair task.
Output:
failure:
kind: syntax|type|test|lint|integration|unknown
summary: ""
likely_files:
- ""
suggested_repair: ""
same_failure_count: 1
PushRepairTask
Suggested node type: Python/shared-store node
Purpose: Push a focused repair item onto repair_stack.
Example repair task:
id: REPAIR-TASK-004-2 parent_task_id: TASK-004 title: "Fix failing parser checksum test" description: "pytest reports checksum validation accepts invalid record." files_allowed: - "src/**" - "tests/**" acceptance_tests: - "failing checksum test passes"
DiffReview
Suggested node type: git diff node plus LLM review node
Purpose: Inspect the diff before commit.
Check for:
only allowed files changed no secrets added no unrelated deletion no massive unexpected rewrite tests added for behavior changes docs updated when needed license unchanged no dependency change without approval
Actions:
safe -> CommitChanges risky -> HumanApproval bad_diff -> RevertOrPatch
HumanApproval
Suggested node type: human review / pause node
Purpose: Pause on risk boundaries.
Require approval for:
new dependencies database/schema migration large diff file deletion license changes network access secrets/config changes test removal public API break hardware voltage/current/safety code changes
CommitChanges
Suggested node type: git command node
Purpose: Commit passing, reviewed changes.
Commit message template:
{{ current_task.id }}: {{ current_task.title }}
Requirements: {{ current_task.source_requirement_ids }}
Gates: passed
Suggested command pattern:
git status --short git add <allowed changed files> git commit -m "TASK-001: implement config loader"
UpdateTraceability
Suggested node type: YAML/Markdown writer node
Purpose: Record what happened.
Suggested files:
docs/dev_log.md docs/verification_log.md docs/traceability.md .ralf/tasks.yaml .ralf/run_log.md
Example trace record:
task_id: TASK-004
requirements:
- REQ-002
commit: abc1234
gates:
- command: pytest
result: pass
- command: ruff check src tests
result: pass
files_changed:
- src/project/config.py
- tests/test_config.py
summary: "Implemented YAML config loader with validation tests."
Existing Node Mapping
| Flow function | Suggested existing node type |
|---|---|
| Project path input | Input/config node |
| Validate folder layout | Python script node |
| Read spec/docs | File reader / Markdown node |
| Scan repo | Directory scanner / Python node |
| Summarize project | LLM prompt node |
| Extract requirements | LLM structured-output node |
| Build task queue | LLM prompt node + Python node |
| Queue/stack mutation | Python shared-store node |
| RALF implementation | LLM agent / command-capable node |
| Run gates | Shell command node |
| Analyze failures | LLM structured-output node |
| Review diff | Git diff node + LLM review node |
| Approval | Human review node |
| Commit | Git command node |
| Traceability | YAML/Markdown writer node |
| Final report | Markdown/artifact writer node |
Where a specialized node does not already exist, use a generic Python or Command node before creating a new custom node.
Recommended Visual Layout in PocketFlow Creator
Arrange the canvas in lanes:
Lane 1: Intake ProjectInput -> ValidateProjectLayout -> ScanProjectFiles -> LoadSpecAndDocs Lane 2: Planning BuildProjectContext -> AnalyzeSpec -> CreateWorkQueue Lane 3: Queue Controller QueueEmptyCheck -> PopNextTask -> TaskFeasibilityCheck Lane 4: RALF Loop CreateRalfPrompt -> ImplementationAgent -> RunGates -> AnalyzeGateFailure -> PushRepairTask Lane 5: Review and Commit DiffReview -> HumanApproval -> CommitChanges -> UpdateTraceability Lane 6: Reporting FinalReport
Minimal First Version
Do not start with the whole autonomous software factory. Start with this vertical slice:
ProjectInput -> ValidateProjectLayout -> LoadSpecAndDocs -> AnalyzeSpec -> CreateWorkQueue -> PopNextTask -> CreateRalfPrompt -> ImplementationAgent -> RunGates -> DiffReview -> CommitChanges -> UpdateTraceability -> QueueEmptyCheck
Add later:
HumanApproval DeferredStack RepairStack Parallel doc indexing Model routing Dependency approval Security scan Coverage gate
Prompt Roles
Use separate prompt nodes instead of one giant prompt:
spec_analyst.mdtask_planner.mdralf_implementer.mdfailure_analyzer.mddiff_reviewer.md
This makes each LLM step easier to test and safer to modify.
Guardrails
The implementation agent may edit:
src/** tests/** docs/** README.md examples/**
It may not edit without approval:
.env secrets/** .git/** LICENSE pyproject.toml package.json requirements.txt deployment/** database migrations hardware voltage/current control files
For hardware projects, anything that could energize VPP, VCC, GPIO direction, socket interlocks, programmer safety logic, motor control, heater control, or other physical-world effects should trigger human review.
Stop Rules
Suggested hard limits:
max_iterations_per_task: 5 max_same_failure_count: 3 max_diff_lines_without_approval: 500 max_files_changed_without_approval: 12 require_tests_for_behavior_change: true require_commit_after_passing_gates: true
Stop and escalate when:
the same test fails three times the agent asks to delete large sections the agent wants a new dependency the agent wants network access a spec contradiction appears task acceptance criteria are not testable the diff is unrelated to the task
Recommended .ralf/ Folder
Have the flow create a control folder:
.ralf/ ├── run_config.yaml ├── queue.yaml ├── deferred.yaml ├── repair_stack.yaml ├── assumptions.yaml ├── decisions.md ├── gate_log.md ├── traceability.yaml ├── prompts/ │ ├── spec_analyst.md │ ├── task_planner.md │ ├── implementer.md │ ├── failure_analyzer.md │ └── diff_reviewer.md └── logs/
This gives the loop durable memory without relying on chat history.
Recommended Build Order
Build the PocketFlow Creator flow in this order:
- Project intake and validation.
- Spec/doc loading.
- Requirement extraction.
- Queue creation.
- Single-task RALF loop.
- Gate runner.
- Git commit.
- Traceability log.
- Repair stack.
- Deferred stack.
- Human approval.
- Final report.
The first test project should be tiny: a Python package with one missing function, one spec file, one docs file, and one failing test. The flow should read the spec, create one task, implement it, run pytest, commit, and write the traceability log.
Once that works, commit the flow definition locally:
feat: add queue-driven RALF software development flow
The key is to make the first version boring and verifiable. Once the single-task loop works reliably, queue and stack orchestration becomes straightforward.
If you got this far, you are set to experiment and modify the flow to fit your own requirements.
Have fun and KEEP CODING!