Designing a PocketFlow Creator Flow for Queue-Driven RALF Software Development

Post Stastics

  • This post has 2371 words.
  • Estimated read time is 11.29 minute(s).

Purpose

This design describes a PocketFlow Creator workflow that accepts a software project folder containing a docs/ subfolder and a project specification, then uses a queue-driven RALF loop to implement the project in ordered, testable increments.

The central idea is simple:

The LLM may propose.
The queue schedules.
The gates decide.
Git preserves.
The human approves high-risk changes.

This prevents the system from becoming a vague autonomous agent that edits code until it feels done. Instead, it becomes a controlled development workflow: read the specification, extract requirements, create a work queue, implement one bounded task, run gates, commit, update traceability, and repeat.


Expected Project Layout

The flow expects a project folder shaped approximately like this:

my_project/
├── docs/
│   ├── architecture.md
│   ├── requirements.md
│   ├── design_notes.md
│   └── ...
├── project_spec.md
├── pyproject.toml / package.json / CMakeLists.txt / etc.
├── src/
├── tests/
└── README.md

The project_spec.md file is the top-level source of truth. Files under docs/ provide supporting architecture, requirements, design notes, and constraints.


What the Flow Does

At a high level, the flow:

  1. Validates the project folder.
  2. Reads project_spec.md and the docs/ folder.
  3. Builds a compact project context.
  4. Extracts requirements, constraints, assumptions, and open questions.
  5. Converts the requirements into ordered implementation tasks.
  6. Places those tasks into a main FIFO work queue.
  7. Uses a repair stack for immediate gate failures.
  8. Uses a deferred stack for blocked or risky tasks.
  9. Runs one bounded RALF loop per task.
  10. Runs objective gates such as tests, linting, type checks, and builds.
  11. Reviews the diff for scope and safety.
  12. Commits passing work.
  13. Updates traceability records.
  14. Repeats until the queue is empty or human intervention is required.

Full Flow Diagram

See assets/diagrams/software_dev_ralf_flow.mmd.




flowchart TD
    A[Project Input] --> B[Validate Project Layout]
    B -->|valid| C[Scan Project Files]
    B -->|missing_spec_or_docs| Z[Human Repair Request]

    C --> D[Load Spec and Docs]
    D --> E[Build Project Context]
    E --> F[Analyze Spec]
    F --> G[Create Ordered Work Queue]
    G --> H{Queue Empty?}

    H -->|yes| Y[Generate Final Report]
    H -->|no| I[Pop Next Work Item]

    I --> J[Task Feasibility Check]
    J -->|ready| K[Create RALF Prompt]
    J -->|blocked| X[Push to Deferred Stack]

    K --> L[RALF Implement Iteration]
    L --> M[Run Quality Gates]

    M -->|pass| N[Diff Review]
    M -->|fail| O{Retry Limit?}

    O -->|no| P[Analyze Failure]
    P --> Q[Push Repair Task]
    Q --> L

    O -->|yes| X

    N -->|safe| R[Commit Changes]
    N -->|risky| S[Human Approval]

    S -->|approved| R
    S -->|rejected| T[Revert or Patch]

    R --> U[Update Traceability]
    U --> H

    X --> V{Main Queue Empty?}
    V -->|no| H
    V -->|yes| W{Deferred Stack Empty?}
    W -->|no| I2[Pop Deferred Task]
    I2 --> J
    W -->|yes| Y

Shared Store

The shared store is the backbone of the flow. The model should not be trusted to remember the project. Durable state belongs in the shared store and on disk.

See assets/schemas/shared_store_schema.yaml.

Major shared-store sections:

project:
  root: ""
  docs_dir: ""
  spec_path: ""

context:
  spec_text: ""
  docs: {}
  repo_map: {}
  dependency_summary: ""

planning:
  requirements: []
  assumptions: []
  constraints: []
  work_queue: []
  repair_stack: []
  deferred_stack: []
  completed_tasks: []
  blocked_tasks: []

current_task:
  id: null
  title: ""
  source_requirement_ids: []
  description: ""
  acceptance_tests: []
  risk_level: "low"

ralf:
  implementation_prompt: ""
  last_agent_output: ""
  last_gate_output: ""
  retry_count: 0

gates:
  commands: []
  results: []
  passed: false

traceability:
  requirement_to_task: {}
  task_to_commits: {}
  task_to_tests: {}
  verification_log: []

Queue and Stack Model

The flow should use three separate structures.

Main Work Queue

The main work queue is FIFO:

pop from front
append new normal tasks to back

This preserves the planner’s intended development order.

Repair Stack

The repair stack is LIFO:

push newest gate failure on top
pop newest repair first

This keeps the loop focused on the most recent failure while the relevant diff is still small.

Deferred Stack

The deferred stack holds blocked, unclear, or risky tasks. These should be revisited only after the main queue drains or after a dependency is satisfied.


Node Design

1. ProjectInput

Suggested existing node type: input/config node
Purpose: Accept project root, spec filename, and docs folder name.

Inputs:

project_root: "/path/to/project"
spec_filename: "project_spec.md"
docs_folder: "docs"

Outputs:

project.root
project.docs_dir
project.spec_path

Action:

default -> ValidateProjectLayout

2. ValidateProjectLayout

Suggested existing node type: Python/script node
Purpose: Confirm required paths exist and determine whether the project is a git repository.

Checks:

project root exists
docs/ exists
project_spec.md exists
git repo exists or can be initialized

Actions:

valid -> ScanProjectFiles
missing_spec_or_docs -> HumanRepairRequest
not_git_repo -> GitInitPrompt

Git should not be silently initialized without approval.


3. ScanProjectFiles

Suggested existing node type: directory scanner or Python node
Purpose: Build a safe project map and detect the language stack.

Ignore:

.git/
.venv/
venv/
node_modules/
dist/
build/
__pycache__/
.cache/
.env
secrets/

Detect stack hints:

pyproject.toml -> Python
package.json -> Node/TypeScript
Cargo.toml -> Rust
build.zig -> Zig
CMakeLists.txt -> C/C++
pom.xml -> Java/Maven

4. LoadSpecAndDocs

Suggested existing node type: file reader / Markdown node
Purpose: Load project_spec.md and supporting docs.

Outputs:

context.spec_text
context.docs

Docs are treated as project truth. Contradictions between docs and spec should be flagged later.


5. BuildProjectContext

Suggested existing node type: LLM prompt node
Purpose: Create a compact repository summary.

The summary should include:

architecture
important files
test system
build commands
known constraints
likely package manager

The node should not invent missing facts.


6. AnalyzeSpec

Suggested existing node type: LLM structured-output node
Purpose: Extract requirements, constraints, assumptions, and open questions.

Output schema:

requirements:
  - id: REQ-001
    title: ""
    description: ""
    priority: must|should|could
    acceptance:
      - ""
    likely_files:
      - ""
constraints:
  - id: CON-001
    text: ""
assumptions:
  - id: ASM-001
    text: ""
open_questions:
  - id: Q-001
    text: ""

Actions:

ok -> CreateWorkQueue
needs_human_clarification -> HumanSpecReview

Implementation should not begin until open questions are resolved or explicitly converted into assumptions.


7. CreateWorkQueue

Suggested existing node type: LLM structured-output node plus Python sorting node
Purpose: Convert requirements into ordered, independently testable tasks.

Example task:

- id: TASK-002
  title: "Implement config loader"
  depends_on: ["TASK-001"]
  source_requirement_ids: ["REQ-002"]
  risk_level: medium
  files_allowed:
    - "src/**"
    - "tests/**"
    - "docs/**"
  acceptance_tests:
    - "config loader handles valid YAML"
    - "config loader rejects malformed YAML"

Recommended ordering:

  1. Project skeleton.
  2. Tests and fixtures.
  3. Core interfaces.
  4. Smallest vertical slice.
  5. Feature increments.
  6. Error handling.
  7. Documentation.
  8. Packaging.
  9. Final verification.

RALF Subflow

See assets/diagrams/ralf_subflow.mmd.

flowchart TD
    A[Create RALF Prompt] --> B[Implementation Agent]
    B --> C[Write or Modify Files]
    C --> D[Run Gates]
    D -->|pass| E[Review Diff]
    D -->|fail| F[Summarize Failure]
    F --> G{Retry Count < Max?}
    G -->|yes| H[Create Repair Prompt]
    H --> B
    G -->|no| I[Defer or Human Escalation]
    E -->|safe| J[Commit]
    E -->|risky| K[Human Approval]

A proper RALF loop does not mean “let the model edit forever.” It means bounded work, repeated attempts, objective feedback, traceable state, and hard stop rules.


RALF Nodes

PopNextTask

Suggested node type: Python/shared-store node
Purpose: Choose the next task.

Priority:

  1. Repair stack.
  2. Main work queue.
  3. Deferred stack.

Pseudo-logic:

if repair_stack:
    current_task = repair_stack.pop()
    return "repair"
elif work_queue:
    current_task = work_queue.pop(0)
    return "task"
elif deferred_stack:
    current_task = deferred_stack.pop()
    return "deferred"
else:
    return "empty"

TaskFeasibilityCheck

Suggested node type: Python plus optional LLM review node
Purpose: Decide whether a task is ready.

Checks:

dependencies completed
allowed files are known
acceptance criteria exist
risk level acceptable
no unresolved spec conflict
no dependency addition required without approval

Actions:

ready -> CreateRalfPrompt
blocked -> PushDeferred
needs_approval -> HumanApproval

CreateRalfPrompt

Suggested node type: Markdown/template node
Purpose: Create the bounded implementation prompt.

The full template is included in assets/prompts/ralf_implementer.md.


ImplementationAgent

Suggested node type: LLM agent / command-capable node
Purpose: Modify files for the current task.

Allowed operations:

read project files
write allowed files
run safe local commands
inspect test output

Forbidden without approval:

delete project files broadly
access secrets
modify .git internals
push to remote
install dependencies
change license
run network commands

The implementation agent does not decide success. It only performs an attempt.


RunGates

Suggested node type: command/test runner node
Purpose: Run objective checks.

Python default:

ruff check src tests
mypy src
pytest

Node/TypeScript default:

npm test
npm run lint
npm run build

C/C++ default:

cmake --build build
ctest --test-dir build --output-on-failure

Zig default:

zig build test

No passing gates, no commit.


AnalyzeGateFailure

Suggested node type: LLM structured-output node
Purpose: Convert raw gate output into a repair task.

Output:

failure:
  kind: syntax|type|test|lint|integration|unknown
  summary: ""
  likely_files:
    - ""
  suggested_repair: ""
  same_failure_count: 1

PushRepairTask

Suggested node type: Python/shared-store node
Purpose: Push a focused repair item onto repair_stack.

Example repair task:

id: REPAIR-TASK-004-2
parent_task_id: TASK-004
title: "Fix failing parser checksum test"
description: "pytest reports checksum validation accepts invalid record."
files_allowed:
  - "src/**"
  - "tests/**"
acceptance_tests:
  - "failing checksum test passes"

DiffReview

Suggested node type: git diff node plus LLM review node
Purpose: Inspect the diff before commit.

Check for:

only allowed files changed
no secrets added
no unrelated deletion
no massive unexpected rewrite
tests added for behavior changes
docs updated when needed
license unchanged
no dependency change without approval

Actions:

safe -> CommitChanges
risky -> HumanApproval
bad_diff -> RevertOrPatch

HumanApproval

Suggested node type: human review / pause node
Purpose: Pause on risk boundaries.

Require approval for:

new dependencies
database/schema migration
large diff
file deletion
license changes
network access
secrets/config changes
test removal
public API break
hardware voltage/current/safety code changes

CommitChanges

Suggested node type: git command node
Purpose: Commit passing, reviewed changes.

Commit message template:

{{ current_task.id }}: {{ current_task.title }}

Requirements: {{ current_task.source_requirement_ids }}
Gates: passed

Suggested command pattern:

git status --short
git add <allowed changed files>
git commit -m "TASK-001: implement config loader"

UpdateTraceability

Suggested node type: YAML/Markdown writer node
Purpose: Record what happened.

Suggested files:

docs/dev_log.md
docs/verification_log.md
docs/traceability.md
.ralf/tasks.yaml
.ralf/run_log.md

Example trace record:

task_id: TASK-004
requirements:
  - REQ-002
commit: abc1234
gates:
  - command: pytest
    result: pass
  - command: ruff check src tests
    result: pass
files_changed:
  - src/project/config.py
  - tests/test_config.py
summary: "Implemented YAML config loader with validation tests."

Existing Node Mapping

Flow functionSuggested existing node type
Project path inputInput/config node
Validate folder layoutPython script node
Read spec/docsFile reader / Markdown node
Scan repoDirectory scanner / Python node
Summarize projectLLM prompt node
Extract requirementsLLM structured-output node
Build task queueLLM prompt node + Python node
Queue/stack mutationPython shared-store node
RALF implementationLLM agent / command-capable node
Run gatesShell command node
Analyze failuresLLM structured-output node
Review diffGit diff node + LLM review node
ApprovalHuman review node
CommitGit command node
TraceabilityYAML/Markdown writer node
Final reportMarkdown/artifact writer node

Where a specialized node does not already exist, use a generic Python or Command node before creating a new custom node.


Recommended Visual Layout in PocketFlow Creator

Arrange the canvas in lanes:

Lane 1: Intake
ProjectInput -> ValidateProjectLayout -> ScanProjectFiles -> LoadSpecAndDocs

Lane 2: Planning
BuildProjectContext -> AnalyzeSpec -> CreateWorkQueue

Lane 3: Queue Controller
QueueEmptyCheck -> PopNextTask -> TaskFeasibilityCheck

Lane 4: RALF Loop
CreateRalfPrompt -> ImplementationAgent -> RunGates -> AnalyzeGateFailure -> PushRepairTask

Lane 5: Review and Commit
DiffReview -> HumanApproval -> CommitChanges -> UpdateTraceability

Lane 6: Reporting
FinalReport

Minimal First Version

Do not start with the whole autonomous software factory. Start with this vertical slice:

ProjectInput
  -> ValidateProjectLayout
  -> LoadSpecAndDocs
  -> AnalyzeSpec
  -> CreateWorkQueue
  -> PopNextTask
  -> CreateRalfPrompt
  -> ImplementationAgent
  -> RunGates
  -> DiffReview
  -> CommitChanges
  -> UpdateTraceability
  -> QueueEmptyCheck

Add later:

HumanApproval
DeferredStack
RepairStack
Parallel doc indexing
Model routing
Dependency approval
Security scan
Coverage gate

Prompt Roles

Use separate prompt nodes instead of one giant prompt:

  • spec_analyst.md
  • task_planner.md
  • ralf_implementer.md
  • failure_analyzer.md
  • diff_reviewer.md

This makes each LLM step easier to test and safer to modify.


Guardrails

The implementation agent may edit:

src/**
tests/**
docs/**
README.md
examples/**

It may not edit without approval:

.env
secrets/**
.git/**
LICENSE
pyproject.toml
package.json
requirements.txt
deployment/**
database migrations
hardware voltage/current control files

For hardware projects, anything that could energize VPP, VCC, GPIO direction, socket interlocks, programmer safety logic, motor control, heater control, or other physical-world effects should trigger human review.


Stop Rules

Suggested hard limits:

max_iterations_per_task: 5
max_same_failure_count: 3
max_diff_lines_without_approval: 500
max_files_changed_without_approval: 12
require_tests_for_behavior_change: true
require_commit_after_passing_gates: true

Stop and escalate when:

the same test fails three times
the agent asks to delete large sections
the agent wants a new dependency
the agent wants network access
a spec contradiction appears
task acceptance criteria are not testable
the diff is unrelated to the task

Recommended .ralf/ Folder

Have the flow create a control folder:

.ralf/
├── run_config.yaml
├── queue.yaml
├── deferred.yaml
├── repair_stack.yaml
├── assumptions.yaml
├── decisions.md
├── gate_log.md
├── traceability.yaml
├── prompts/
│   ├── spec_analyst.md
│   ├── task_planner.md
│   ├── implementer.md
│   ├── failure_analyzer.md
│   └── diff_reviewer.md
└── logs/

This gives the loop durable memory without relying on chat history.


Recommended Build Order

Build the PocketFlow Creator flow in this order:

  1. Project intake and validation.
  2. Spec/doc loading.
  3. Requirement extraction.
  4. Queue creation.
  5. Single-task RALF loop.
  6. Gate runner.
  7. Git commit.
  8. Traceability log.
  9. Repair stack.
  10. Deferred stack.
  11. Human approval.
  12. Final report.

The first test project should be tiny: a Python package with one missing function, one spec file, one docs file, and one failing test. The flow should read the spec, create one task, implement it, run pytest, commit, and write the traceability log.

Once that works, commit the flow definition locally:

feat: add queue-driven RALF software development flow

The key is to make the first version boring and verifiable. Once the single-task loop works reliably, queue and stack orchestration becomes straightforward.

If you got this far, you are set to experiment and modify the flow to fit your own requirements.

Have fun and KEEP CODING!

Leave a Reply

Your email address will not be published. Required fields are marked *