Agent Mode Guide

Gemini Scribe v4.0 is agent-first - every conversation is powered by an AI assistant that can actively work with your vault through tool calling. This guide covers everything you need to know about using the agent effectively and safely.

What is the Agent?

In v4.0+, the agent is always available and can:

Read and search files in your vault
Create, modify, and organize notes
Search the web for information
Fetch and analyze web pages
Execute multiple operations in sequence
Work autonomously while respecting your permissions

New in v4.0: Agent mode is no longer a separate feature you enable - it's the core of how Gemini Scribe works. Every chat is an agent session with full tool-calling capabilities.

Getting Started

1. Open Agent Chat

Use Command Palette: "Gemini Scribe: Open Gemini Chat"
Or click the sparkles icon (⭐) in the ribbon
Or use your configured hotkey
You can also manage sessions directly from the command palette with:
- "New Agent Session"
- "Browse Agent Sessions"
- "Link Project to Agent Session"
- "Agent Session Settings"

2. Initialize Vault Context (Recommended)

In an empty agent session, click "Initialize Vault Context"
The agent will analyze your vault structure and create AGENTS.md
This helps the agent understand your vault organization
Update periodically as your vault grows

3. Configure Permissions

Choose which operations require confirmation in Settings → Gemini Scribe:

Create files
Modify files
Delete files
Move/rename files

When the agent needs to perform these operations, an in-chat confirmation request appears with interactive buttons. You can also use "Don't ask again this session" for trusted workflows. See Tool Confirmations for details.

Core Features

Tool Calling

The agent can execute various tools to help with your tasks:

User: Find all my meeting notes from this week and create a summary

Agent: I'll help you find and summarize your meeting notes. Let me:
1. Search for meeting notes from this week
2. Read their contents
3. Create a summary document

[Executes find_files_by_name tool]
[Executes read_file tool for each result]
[Executes write_file tool to create summary]

File Attachments & Drag-and-Drop

You can include images, audio, video, PDFs, and text files in your chat. Files are automatically classified and routed:

Adding Files:

Paste images directly from your clipboard (Ctrl/Cmd+V)
Drag and drop files from your vault's file explorer into the input box
Drag and drop files from your OS file manager (if they're inside the vault)
Drag and drop folders to include all contained files
Multiple files can be attached to a single message

How Files Are Routed:

When you drop a file, the plugin classifies it based on its extension:

Category	Extensions	Action
Text	`.md`, `.txt`, `.ts`, `.js`, `.json`, `.html`, `.css`, `.py`, etc.	Added as context chips (the AI reads the file content)
Binary (Gemini-supported)	`.png`, `.jpg`, `.gif`, `.webp`, `.pdf`, `.mp3`, `.wav`, `.mp4`, `.mov`, etc.	Sent as inline data (the AI processes the binary content directly)
Unsupported	`.zip`, `.exe`, `.dmg`, etc.	Skipped with a notification

Supported Binary Formats:

Images: PNG, JPEG, GIF, WebP, HEIC, HEIF
Audio: WAV, MP3, AAC, FLAC
Video: MP4, MPEG, MOV, FLV, MPG, WebM, WMV, 3GP
Documents: PDF

How It Works:

When you add a binary file, a preview appears above the input (thumbnail for images, icon + filename for other types)
Click the × button on any preview to remove it before sending
Pasted/external images are saved to your vault's attachment folder; vault files are referenced in place
The AI receives both the file content and its vault path for referencing
Images appear in the chat as wikilink embeds (e.g., ![[attachments/pasted-image.png]]); non-image attachments (PDF, audio, video) are listed by vault path and type label

Size Limits:

Cumulative inline data is limited to 20 MB per message
Files exceeding the limit are skipped with a notification

Privacy Note: Attached files are sent to the Gemini API for analysis. Avoid attaching files containing sensitive, confidential, or personal information.

Usage Examples:

text

User: [pastes screenshot] What's wrong with this error message?

Agent: I can see a TypeScript error in your screenshot. The issue is...

text

User: [drops a PDF] Summarize the key points from this document

Agent: Based on the PDF, here are the main points...

text

User: [drops a folder with mixed files] Review these project files

Agent: I can see the markdown notes in context and the attached images...

Combining with Context Files:

Attachments work alongside @ mentions and context files. You can:

Reference attached files in context: "Look at the screenshot and update @ProjectNotes with the solution"
Ask the agent to embed images in notes it creates
Use file paths in wikilinks: ![[path/to/image.png]]

Edge Cases:

Large files are sent as-is (no automatic compression)
Unsupported file types are skipped with a notification
If file processing fails, you'll see a notification
Dropping a folder recursively includes all child files

Context Files & the File Shelf

Context files are displayed in a unified file shelf — a horizontal strip above the input area that shows all attached files (text files, folders, and binary attachments) in one place.

Adding files:

Type @ in the chat input to open the file picker (supports text files and Gemini-compatible binary files like images, PDFs, audio, and video)
Type / in an empty chat input to open the skill picker — select a skill to insert an activation prompt you can edit before sending (see Agent Skills for details)
Click the file icon in the session header to open the multi-select modal:
- Already-added files appear pre-checked
- Type to fuzzy-search; Enter toggles a file or folder; Esc confirms and closes
- Selecting a folder adds all markdown files inside it (folders are re-expanded each turn, so newly added files are automatically included)
- Unchecking a file or folder removes it from context
Drag and drop files or folders from the file explorer or your OS
Paste images from your clipboard

Interacting with the shelf:

Click a shelf item to open the file in Obsidian
Click the × button to remove an item
Keyboard navigation: Arrow Left/Right to move between items, Enter/Space to open, Delete/Backspace to remove
Text files and folders show a pin badge indicating they're sent with every message
Binary attachments are marked as "sent" after a message and cleaned up automatically

For detailed information about context files and advanced usage, see the Context System Guide.

Session Management

Each conversation is a separate session
Sessions persist across Obsidian restarts
Access previous sessions from the dropdown
Configure session-specific settings
Sessions are automatically titled with a YYYY-MM-DD date prefix and AI-generated description after the first exchange
All files the agent reads or writes during a session are tracked in accessed_files frontmatter for auditing and session recall
Tool execution summaries are logged to session history as collapsible callout blocks (controlled by the logToolExecution setting)

Available Tools

Read-Only Tools

find_files_by_name

Search for files by name pattern (searches filenames/paths only). Searches all file types, not just markdown:

Find all files containing "project"
Search for "*.md" files in the Projects folder
Find all PNG images in my vault with "*.png"

find_files_by_content

Search for text within file contents (grep-style search):

Find all notes mentioning "machine learning"
Search for TODO items across my vault
Find files containing the phrase "quarterly review" (case-insensitive)
Search using regex pattern: "deadline.*2024"

Supports:

Case-sensitive and case-insensitive search
Regex patterns
Context lines before/after matches
Respects system folder exclusions

read_file

Read the contents of any file in your vault. Supports text files (markdown, code, .base, .canvas) and binary files that Gemini can process (images, audio, video, PDF):

Read the contents of my daily note
Show me what's in Projects/Todo.md
Describe the image at images/diagram.png
Transcribe the recording at audio/meeting.mp3
Read the PDF at docs/report.pdf

When you ask the agent to read a binary file, it sends the file data directly to Gemini for analysis — enabling image description, audio transcription, PDF reading, and video analysis without manual drag-and-drop.

If a file doesn't exist, the agent receives a non-error response with exists: false and helpful suggestions for similar file names. This allows automation skills to probe for files without triggering error states.

list_files

List files in a folder. Returns all file types (not just markdown):

Show me all files in the Archive folder
List the contents of my Templates directory
What files are in the attachments folder?

get_workspace_state

Get metadata about all Markdown files currently open in the editor. Returns each file's path, wikilink, whether it is visible in a pane, whether it is the active (focused) file, and any text the user has selected. Also includes the current project if the session is linked to one. Note: this tool only reports open Markdown editor views — PDFs, images, canvases, and other non-Markdown files are not included. Use read_file for those.

text

What files do I have open?
Look at what I'm working on and help me with the current file
What do I have selected?

The agent uses this to understand your workspace context without needing files to be manually added to the session. Use read_file to get the actual content of specific files the agent identifies.

Vault Operations

write_file

Create or update files:

Create a new note called "Meeting Minutes"
Update my todo list with these items

delete_file

Remove files (requires confirmation):

Delete the old draft file
Remove temporary notes from yesterday

append_content

Add text to the end of a file without rewriting the entire content. Ideal for logs, journals, and incremental updates:

text

Add today's entry to my daily log
Append the meeting action items to the project tracker

update_frontmatter

Safely modify note properties (frontmatter) without touching the body content:

text

Set the status property to "complete" on my project note
Add the "reviewed" tag to all meeting notes

move_file

Move or rename files:

Move completed tasks to the Archive folder
Rename "Untitled" to "Project Proposal"

create_folder

Create a new folder in your vault:

text

Create a "Meetings/2026" folder for this year's meeting notes

update_memory / read_memory

Append to or read your vault's AGENTS.md memory file. The agent uses these to remember vault-wide context — folder layout, naming conventions, user preferences — across sessions:

text

Remember that I keep all meeting notes under "Meetings/" by quarter
What do you remember about my vault?

update_memory requires confirmation; read_memory is read-only. The "Initialize Vault Context" button is the seed that creates AGENTS.md in the first place.

Web & Research Operations

google_search

Search the web for current information:

Search for the latest Obsidian plugin development docs
Find recent research on productivity methods

fetch_url

Retrieve and analyze web page content:

Get the content from this documentation page
Analyze this blog post and summarize key points

deep_research

Conduct multi-source research with citations and (optionally) save the report to your vault. Distinct from google_search — Deep Research runs iterative multi-turn investigation that takes minutes rather than seconds. See the Deep Research guide for scope options (web_only, vault_only, both) and example prompts.

text

Research the latest developments in quantum error correction and save it to Research/quantum.md

generate_image

Generate an image from a prompt and save it to your vault. The agent picks a default attachment path if you don't specify one. Available on the Gemini provider only.

text

Generate a watercolor diagram of a Zettelkasten workflow and embed it in my notes

Vault Search

vault_semantic_search

Search your vault by meaning, not just keywords, via the indexed File Search Store. Available when Semantic Vault Search is enabled. The agent uses this automatically when a question calls for concept-based retrieval; you don't need to invoke it directly.

text

Find my notes about machine learning algorithms
What did I write about project deadlines in my work folder?

Session Memory

recall_sessions

Search past agent sessions by file, project, or topic. The agent uses this tool proactively to maintain continuity across sessions — you don't need to explicitly ask it to remember. It will automatically check for relevant past sessions when you're working on files or topics that have prior history.

text

What did we discuss about the magic system last time?
Find sessions where we worked on the API integration
Show me past sessions for the Novel project
Continue where we left off on the character outline

Returns session summaries with title, date, files accessed, and project linkage. The agent can then read the full conversation from a past session using read_file on the returned historyPath. This enables continuity-aware conversations where the agent remembers prior decisions, approaches, and context.

Skill Tools

Gemini Scribe supports an extensible skills system based on the agentskills.io specification. Skills are self-contained packages of instructions that give the agent specialized knowledge and workflows. If you're wondering whether to use a skill or a custom prompt, see the comparison in the Skills guide.

How Skills Work

Skills are stored in your plugin state folder at gemini-scribe/Skills/. Each skill is a directory containing a SKILL.md file with instructions the agent can load on demand. The agent automatically knows which skills are available — their names and descriptions are included in every agent session.

When the agent encounters a task that matches an available skill, it will activate the skill to load its full instructions before proceeding.

activate_skill

Load a skill's full instructions or resources:

Activate the code-review skill and review my latest note
Use the meeting-notes skill to process my meeting notes

You can also ask the agent to load specific resources from a skill:

Load the reference docs from the code-review skill

create_skill

Create new skills from your conversations:

Create a skill called "daily-review" that helps me review and organize my daily notes

The agent will create a properly formatted SKILL.md file with the name, description, and instructions you provide. Skills you create will be available in all future sessions.

edit_skill

Update an existing skill's description, instructions, or both:

text

Update the meeting-notes skill to also extract key decisions
Change the description of my code-review skill

The agent reads the current skill content (via activate_skill), then uses edit_skill to write the updated description or body. You can update either field independently — omitting one preserves the existing value. Requires user confirmation before writing.

SKILL.md Format

Each skill follows a simple format — YAML frontmatter with a name and description, followed by markdown instructions:

yaml

---
name: my-skill
description: >-
  Description of what this skill does and when to use it.
---
# My Skill

Step-by-step instructions for the agent...

Skills can also include optional subdirectories:

references/ — Detailed reference documents
assets/ — Templates, data files
scripts/ — Reference scripts (read-only in Obsidian)

Discovering Available Skills

The agent automatically knows which skills are installed. Simply ask:

What skills do you have available?

Session Configuration

Session-Level Settings

Override global settings for specific conversations:

Click the settings icon next to session name
Configure:
- Model (e.g., switch to Gemini 2.5 Pro for harder reasoning)
- Temperature (creativity level)
- Top-P (response diversity)
- Custom prompt template

Permissions

Set session-specific permissions:

Bypass confirmations for trusted operations
Temporarily enable additional tools
Restrict access for sensitive sessions

Tool Confirmations

When the agent needs to perform operations that require your approval (like creating, modifying, or deleting files), an in-chat confirmation request appears directly in the conversation.

How Confirmations Work

Instead of popup modals, confirmation requests appear as interactive messages in the chat:

text

🔒 Permission Required

📝 Write File
Vault Operation • Requires Confirmation

Create or update a file in the vault

Parameters:
• path: "notes/Meeting-Summary.md"
• content: "# Meeting Summary..." (1,234 chars)

[✓ Allow] [✗ Cancel] [☑ Don't ask again this session]

Confirmation Actions

✓ Allow - Approve this operation

The agent proceeds with the operation
Confirmation message updates to show approval
The agent continues with subsequent steps

✗ Cancel - Decline this operation

The agent cancels the operation
Confirmation message updates to show cancellation
The agent may explain why it cannot continue or suggest alternatives

☑ Don't ask again this session - Create session-level permission

Check this box before clicking Allow
The agent won't request confirmation for this tool again during the current session
Useful for repetitive operations you trust
Important: Permission resets when you create a new session or restart Obsidian

After You Respond

Once you click a button, the confirmation request updates to show the result:

text

✓ Permission granted: Write File was allowed

text

✗ Permission denied: Write File was cancelled

Diff View for File Changes

When the agent proposes file changes (via write_file, append_content, create_skill, or edit_skill), the confirmation card includes a View Changes button that opens a side-by-side diff view. This lets you:

See exactly what will change before approving
Edit the proposed content directly in the diff view before clicking Allow
If you modify the content, the tool result reports userEdited: true so the agent knows

Enable "Always show diff view for file writes" in settings to automatically open the diff view with every confirmation instead of requiring a button click.

What Operations Require Confirmation

By default, these operations require confirmation:

write_file: Creating or modifying files
delete_file: Removing files
move_file: Moving or renaming files
append_content: Adding text to the end of files
create_skill: Creating new skill packages
edit_skill: Updating existing skill instructions

You can configure which operations require confirmation in Settings → Gemini Scribe → Tool Permissions (under Advanced Settings).

Session-Level Permissions

When you check "Don't ask again this session" and click Allow:

The permission is remembered for the current session only
Future uses of that tool won't prompt for confirmation
Other tool types still require confirmation (unless you've also allowed them)
The permission is cleared when you:
- Create a new session
- Load a different session
- Restart Obsidian

Use case example:

text

User: Organize my daily notes into monthly folders

[Agent requests permission to move first file]
🔒 Permission Required - Move File
[You check "Don't ask again this session" and click Allow]

[Agent proceeds to move all remaining files without additional prompts]

Reviewing Confirmation Details

Before clicking Allow, always review:

Tool Name: What operation the agent wants to perform
Parameters: File paths, content snippets, and other details to verify
File Paths: Ensure paths are correct and won't overwrite important files
Content Preview: Check the content looks reasonable (for write operations) Example - Be careful with destructive operations:

text

🔒 Permission Required

🗑️ Delete File
Vault Operation • Requires Confirmation

Delete a file from the vault

Parameters:
• path: "important-research.md"  ⚠️ Double-check this path!

[✓ Allow] [✗ Cancel] [☑ Don't ask again this session]

Best Practices for Confirmations

Start Cautious: Don't use "Don't ask again" until you trust the agent's behavior for your specific task
Review File Paths: Always check paths before allowing file operations
Read-Only First: Test with read-only operations before allowing writes
Backup Important Data: Have backups before bulk operations
Cancel and Clarify: If unsure, click Cancel and ask the agent to explain what it's trying to do
Session Scope: Remember that "Don't ask again" only applies to the current session

Best Practices

1. Start with Read-Only

Begin with read-only operations to understand how the agent works:

Show me all my notes tagged with #important
Find notes I haven't updated in 30 days
Search for broken links in my vault

2. Use Clear Instructions

Be specific about what you want:

Good: "Create a weekly summary of all notes tagged #meeting from the past 7 days"
Less clear: "Summarize my meetings"

3. Review Before Confirming

When in-chat confirmation requests appear:

Read the tool name and operation type
Review all parameters (especially file paths)
Check content previews for write operations
Ensure you have backups for destructive operations
See the Tool Confirmations section for detailed guidance

4. Leverage Context Files

Add relevant files as context for better results:

Template files for consistent formatting
Style guides for writing tasks
Reference documents for research

5. Use Sessions Effectively

Create new sessions for different projects
Name sessions descriptively
Review session history for insights

Advanced Usage

Multi-Step Workflows

The agent excels at complex, multi-step tasks:

User: Organize my research notes. Group them by topic, create an index, and archive anything older than 6 months.

Agent: I'll help organize your research notes. This will involve:
1. Finding all research notes
2. Analyzing their topics
3. Creating topic-based folders
4. Moving files to appropriate folders
5. Creating an index file
6. Archiving old notes

Let me start by searching for research notes...
[Executes multiple tools in sequence]

Template-Based Operations

Use templates for consistent results:

User: Create a new project using my project template

Agent: I'll create a new project structure for you.
[Reads template]
[Creates folder structure]
[Populates with template files]
[Updates project index]

Research Assistant

Combine vault and web operations:

User: Research productivity methods and create notes for the most promising ones

Agent: I'll research productivity methods and create notes.
[Searches web for productivity methods]
[Fetches relevant articles]
[Creates structured notes]
[Links to existing notes]

Safety Features

Protected Folders

The following folders are automatically protected:

.obsidian/ - Plugin configurations
gemini-scribe/ - Plugin state files
Any folder containing plugin data

Loop Detection

Prevents infinite execution loops:

Detects repeated identical operations
Stops after threshold (default: 3)
Configurable time window

Error Handling

Operations stop on errors (configurable)
Clear error messages explain failures
Non-destructive fallback behaviors

Confirmation System

In-chat confirmation requests for vault operations (create, modify, delete, move)
Interactive buttons to Allow or Cancel each operation
Review tool details and parameters before approving
"Don't ask again this session" option for repetitive trusted operations
Session-level permissions reset when session ends
See Tool Confirmations for complete workflow details

Troubleshooting

Agent Not Responding

Check agent mode is enabled
Verify API key supports function calling
Ensure selected model supports tools (all current Gemini models do)

Tools Not Available

Check tool category is enabled in settings
Verify session has proper permissions
Some tools may be incompatible with search grounding

Operations Failing

Check file paths are correct
Ensure you have vault permissions
Verify files aren't open in other applications
Check for protected folder restrictions

Performance Issues

Reduce number of context files
Use more specific search patterns
Break complex tasks into steps
Consider using faster models for simple tasks

Examples and Recipes

Daily Review

Review all notes modified today, summarize key points, and update my daily journal

Knowledge Management

Find all notes without tags, analyze their content, and suggest appropriate tags

Content Creation

Create a blog post outline based on my notes about [topic], then draft the introduction

Vault Maintenance

Find duplicate notes, broken links, and orphaned files, then create a cleanup report

Research Project

Search for information about [topic], create structured notes, and link to relevant existing notes

Tips and Tricks

Save Useful Prompts: Keep a note with prompts that work well
Chain Operations: Use "then" to connect multiple tasks
Iterate Gradually: Start simple and add complexity
Use Naming Conventions: Consistent file names help the agent
Review History: Learn from past sessions
Set Boundaries: Use permissions to stay in control
Backup Important Data: Before major operations
Experiment Safely: Use a test vault for learning

Future Possibilities

As agent mode evolves, consider these use cases:

Automated vault organization
Intelligent note linking
Research automation
Content generation pipelines
Knowledge graph analysis
Workflow automation

Remember: The agent is a powerful tool, but you remain in control. Use it to augment your thinking, not replace it.

Agent Mode Guide ​

What is the Agent? ​

Getting Started ​

1. Open Agent Chat ​

2. Initialize Vault Context (Recommended) ​

3. Configure Permissions ​

Core Features ​

Tool Calling ​

File Attachments & Drag-and-Drop ​

Context Files & the File Shelf ​

Session Management ​

Available Tools ​

Read-Only Tools ​

find_files_by_name ​

find_files_by_content ​

read_file ​

list_files ​

get_workspace_state ​

Vault Operations ​

write_file ​

delete_file ​

append_content ​

update_frontmatter ​

move_file ​

create_folder ​

update_memory / read_memory ​

Web & Research Operations ​

google_search ​

fetch_url ​

deep_research ​

generate_image ​

Vault Search ​

vault_semantic_search ​

Session Memory ​

recall_sessions ​

Skill Tools ​

How Skills Work ​

activate_skill ​

create_skill ​

edit_skill ​

SKILL.md Format ​

Discovering Available Skills ​

Session Configuration ​

Session-Level Settings ​

Permissions ​

Tool Confirmations ​

How Confirmations Work ​

Confirmation Actions ​

After You Respond ​

Diff View for File Changes ​

What Operations Require Confirmation ​

Session-Level Permissions ​

Reviewing Confirmation Details ​

Best Practices for Confirmations ​

Best Practices ​

1. Start with Read-Only ​

2. Use Clear Instructions ​

3. Review Before Confirming ​

4. Leverage Context Files ​

5. Use Sessions Effectively ​

Advanced Usage ​

Multi-Step Workflows ​

Template-Based Operations ​

Research Assistant ​

Safety Features ​

Protected Folders ​

Loop Detection ​

Error Handling ​

Confirmation System ​

Troubleshooting ​

Agent Not Responding ​

Tools Not Available ​

Operations Failing ​

Performance Issues ​

Examples and Recipes ​

Daily Review ​

Knowledge Management ​

Content Creation ​

Vault Maintenance ​

Research Project ​

Agent Mode Guide

What is the Agent?

Getting Started

1. Open Agent Chat

2. Initialize Vault Context (Recommended)

3. Configure Permissions

Core Features

Tool Calling

File Attachments & Drag-and-Drop

Context Files & the File Shelf

Session Management

Available Tools

Read-Only Tools

find_files_by_name

find_files_by_content

read_file

list_files

get_workspace_state

Vault Operations

write_file

delete_file

append_content

update_frontmatter

move_file

create_folder

update_memory / read_memory

Web & Research Operations

google_search

fetch_url

deep_research

generate_image

Vault Search

vault_semantic_search

Session Memory

recall_sessions

Skill Tools

How Skills Work

activate_skill

create_skill

edit_skill

SKILL.md Format

Discovering Available Skills

Session Configuration

Session-Level Settings

Permissions

Tool Confirmations

How Confirmations Work

Confirmation Actions

After You Respond

Diff View for File Changes

What Operations Require Confirmation

Session-Level Permissions

Reviewing Confirmation Details

Best Practices for Confirmations

Best Practices

1. Start with Read-Only

2. Use Clear Instructions

3. Review Before Confirming

4. Leverage Context Files

5. Use Sessions Effectively

Advanced Usage

Multi-Step Workflows

Template-Based Operations

Research Assistant

Safety Features

Protected Folders

Loop Detection

Error Handling

Confirmation System

Troubleshooting

Agent Not Responding

Tools Not Available

Operations Failing

Performance Issues

Examples and Recipes

Daily Review

Knowledge Management

Content Creation

Vault Maintenance

Research Project