Code Execution

StirrupJS includes three code execution backends: local, Docker, and E2B cloud sandboxes. All backends provide the same code_exec tool interface.

Overview

The code_exec tool allows agents to execute shell commands in an isolated environment. This is one of the most powerful built-in tools.

Common use cases: - Installing and using Python packages - Data analysis and visualization - File processing and conversion - Running scripts and automation - Image and video manipulation

Local Execution (Default)

The LocalCodeExecToolProvider executes commands in a temporary directory on your machine.

Basic Usage

import { LocalCodeExecToolProvider } from '@stirrup/stirrup';

const agent = new Agent({
  client,
  tools: [new LocalCodeExecToolProvider()],
  finishTool: SIMPLE_FINISH_TOOL,
});

Or use DEFAULT_TOOLS which includes local execution:

import { DEFAULT_TOOLS } from '@stirrup/stirrup';

const agent = new Agent({
  client,
  tools: DEFAULT_TOOLS,  // Includes LocalCodeExecToolProvider
  finishTool: SIMPLE_FINISH_TOOL,
});

Features

Isolated temp directory: Each session gets a fresh temp directory
Security: Commands run with current user permissions
uv package manager: Recommended for Python packages (fast, reliable)
File access: Read/write files in execution directory
Output capture: Returns stdout, stderr, and exit code

Example

await using session = agent.session({ outputDir: './output' });

await session.run(`
  Create a Python script that generates a sine wave chart:
  1. Use matplotlib to create the chart
  2. Save as sine_wave.png
  3. Call finish with the file path
`);

// File automatically saved to ./output/sine_wave.png

Command Restrictions

You can restrict which commands are allowed:

const provider = new LocalCodeExecToolProvider([
  'python',
  'node',
  'uv',
  'pip',
]);

Custom Temp Directory

const provider = new LocalCodeExecToolProvider(
  undefined,  // No command restrictions
  '/custom/temp/path'
);

Docker Execution

The DockerCodeExecToolProvider runs commands in Docker containers for better isolation.

Installation

Requires Docker to be installed and running on your machine.

Basic Usage

import { DockerCodeExecToolProvider } from '@stirrup/stirrup';

const agent = new Agent({
  client,
  tools: [new DockerCodeExecToolProvider('python:3.12-slim')],
  finishTool: SIMPLE_FINISH_TOOL,
});

Configuration

const provider = new DockerCodeExecToolProvider(
  'python:3.12-slim',  // Docker image
  undefined,           // Allowed commands (undefined = all)
  undefined,           // Temp directory base path
  'Docker code execution'  // Custom description
);

Features

Full isolation: Commands run in containers
Consistent environment: Same environment every time
Security: Better isolation than local execution
Custom images: Use any Docker image
File access: Files automatically mounted

Example Images

// Python with scientific packages
new DockerCodeExecToolProvider('python:3.12-slim')

// Node.js
new DockerCodeExecToolProvider('node:20-alpine')

// Custom image with dependencies
new DockerCodeExecToolProvider('myorg/custom-agent-env:latest')

Limitations

Slower startup (container creation)
Requires Docker installed
More resource intensive

E2B Cloud Sandboxes

The E2BCodeExecToolProvider runs commands in E2B cloud sandboxes - fully isolated, cloud-based environments.

Installation

Sign up for E2B: https://e2b.dev

Set your API key:

export E2B_API_KEY='your-key-here'

Basic Usage

import { E2BCodeExecToolProvider } from '@stirrup/stirrup';

const agent = new Agent({
  client,
  tools: [new E2BCodeExecToolProvider({
    apiKey: process.env.E2B_API_KEY!,
    template: 'base',
  })],
  finishTool: SIMPLE_FINISH_TOOL,
});

Configuration

const provider = new E2BCodeExecToolProvider({
  apiKey: process.env.E2B_API_KEY!,
  template: 'base',           // E2B template
  metadata: { user: 'alice' }, // Optional metadata
  timeoutMs: 60000,           // Execution timeout
});

Features

Cloud-based: No local setup required
Scalable: Automatic resource management
Secure: Complete isolation in cloud
Fast: Pre-warmed environments
Consistent: Same environment everywhere

Templates

E2B provides pre-built templates:

base: Basic Ubuntu environment
python: Python with common packages
node: Node.js environment
Custom: Create your own templates

See E2B templates for details.

Advantages

No Docker installation needed
Better for production deployments
Scales automatically
Consistent across development/production

Limitations

Requires internet connection
Costs money (has free tier)
Slightly higher latency than local

Full Example

See examples/e2b-example.ts for a complete working example using E2B with the code interpreter template.

Choosing a Backend

Backend	Best For	Pros	Cons
Local	Development, prototyping	Fast, free, simple setup	Less isolation, requires local environment
Docker	Production, better isolation	Good isolation, consistent env	Slower startup, needs Docker
E2B	Production, scale	Cloud-based, secure, scalable	Costs money, needs internet

Recommendations

Development: Use LocalCodeExecToolProvider (fastest iteration)
Production (self-hosted): Use DockerCodeExecToolProvider (better isolation)
Production (cloud): Use E2BCodeExecToolProvider (easiest deployment)

Using Python Packages

All backends support package installation. We recommend using uv for fast, reliable package management.

Example Task

await session.run(`
  Create a data visualization:
  1. Use 'uv pip install matplotlib pandas'
  2. Load data from data.csv
  3. Create a bar chart
  4. Save as chart.png
  5. Call finish with the file path
`);

Package Installation with uv

// Agent automatically uses uv
await session.run(`
  Install pandas and analyze data.csv:
  1. Run: uv pip install pandas
  2. Load the CSV
  3. Calculate statistics
  4. Print results
`);

Traditional pip

await session.run(`
  Use traditional pip if needed:
  1. Run: pip install requests
  2. Fetch data from API
  3. Process and save
`);

Working with Files

Creating Files

The agent can create files in its execution directory:

await using session = agent.session({ outputDir: './results' });

await session.run(`
  Create three files:
  1. data.json - some JSON data
  2. chart.png - a visualization
  3. report.txt - a summary

  Call finish with all three file paths.
`);

// All three files saved to ./results/

Reading Files

Files in the execution directory can be read:

await session.run(`
  Read the file 'input.txt' and count words.
  The file should be in the current directory.
`);

File Persistence

Files persist within a session:

await session.run('Create data.csv with sample data');
await session.run('Load data.csv and create a chart');
// data.csv is still available

But are cleaned up when session ends:

await using session = agent.session();
await session.run('Create temp.txt');
// Session ends, temp.txt is deleted

Security Considerations

Local Execution

Commands run with your user permissions
Can access files on your system
Use allowedCommands to restrict execution
Consider using Docker for untrusted code

Docker Execution

Commands isolated in containers
Cannot access host filesystem (except mounted volumes)
Better for running untrusted code
Still requires Docker daemon access

E2B Execution

Full isolation in cloud sandboxes
Cannot access local system
Best security for untrusted code
Network access can be restricted

Best Practices

Validate user input: Never pass unsanitized user input directly to commands
Restrict commands: Use allowedCommands in local/Docker
Use Docker/E2B: For production or untrusted code
Monitor execution: Use structured logging to track commands
Set timeouts: Prevent infinite loops/long-running commands

Advanced Usage

Custom Command Restrictions

const provider = new LocalCodeExecToolProvider([
  'python',    // Allow Python
  'python3',   // And Python3
  'uv',        // And uv
  'node',      // And Node.js
  // Block everything else (bash, rm, etc.)
]);

Custom Description

Help the agent understand the environment:

const provider = new LocalCodeExecToolProvider(
  undefined,
  undefined,
  'Execute commands in a Python 3.12 environment with numpy, pandas, and matplotlib pre-installed. Use uv for package management.'
);

Multiple Execution Environments

const pythonEnv = new DockerCodeExecToolProvider('python:3.12');
const nodeEnv = new DockerCodeExecToolProvider('node:20');

const agent = new Agent({
  client,
  tools: [pythonEnv, nodeEnv],  // Error: Only one code exec provider allowed
  finishTool: SIMPLE_FINISH_TOOL,
});

One Code Exec Provider

Agents can only have one code execution provider. Use Docker/E2B with pre-installed tools if you need multiple languages.

Examples

Data Analysis

await session.run(`
  Analyze sales_data.csv:
  1. Install pandas: uv pip install pandas
  2. Load the CSV
  3. Calculate total sales, average, and top 5 products
  4. Print results in a formatted table
`);

Image Processing

await session.run(`
  Process image.jpg:
  1. Install Pillow: uv pip install Pillow
  2. Resize to 800x600
  3. Apply blur filter
  4. Save as processed.jpg
  5. Call finish with the path
`);

Web Scraping

await session.run(`
  Scrape data from example.com:
  1. Install beautifulsoup4 requests: uv pip install beautifulsoup4 requests
  2. Fetch the page
  3. Extract all article titles
  4. Save to titles.json
  5. Call finish with the path
`);

Next Steps

Creating Tools - Build custom tools
Tool Providers - Advanced tool management
Examples - More code execution examples