The First Agent-Native
Data Platform

Ingest, validate, transform, store, retrieve, and analyze your data — whether you're an AI agent talking through MCP, a developer, or a data analyst. One platform for all.

Datris Assistant online
I'm looking for corporate earnings data.
Happy to set that up. Before I build anything, two quick questions so I don't guess:
Source: SEC EDGAR (free, official), Polygon.io, or Alpha Vantage?
Scope: which tickers, and how far back?
SEC EDGAR, Mag 7, last 2 years.
🔑Inspecting secret sec-edgar
Creating pipeline corporate-earningsmongodb
Creating tap corporate-earnings
Test run looks good — flat fact records flowing across all 7 tickers. Running the real load now.
Running tap corporate-earnings
Done. 24,217 fact records for the Mag 7 loaded into datris.corporate-earnings in ~4 seconds.
One thing to flag for the recurring runs: each weekly run currently appends all matching facts. If you'd rather upsert, I can set keyFields on the pipeline.
→ Open tap corporate-earnings
Describe the data you want to ingest… Send

Type what you want.
The Assistant builds it.

Datris ships with an Assistant Agent inside the platform UI. Tell it what data you want. It asks a few sharp scoping questions, picks the right source and destination, generates the fetcher, requests credentials securely, runs the job, confirms the rows landed, and lets you query the result — usually in seconds, with no hand-written config.

  • Clarifies scope before building — picks the right source, destination, and cadence with you
  • Generates and runs the fetcher (tap) for external APIs and files
  • Requests credentials through a secure form — never in chat history
  • Polls job status and confirms rows actually landed before saying "done"
  • Flags things you didn't ask about — upsert vs append, fair-use policies, schema drift
  • Answers natural-language questions over the data once it's in
Watch the Assistant build a pipeline end-to-end
Things you can say to it
  • "I'm looking for corporate earnings data."
  • "Ingest these PDFs into a vector store for RAG."
  • "Refresh treasury yields from FRED nightly and let me query trends."
Every request becomes a durable pipeline, schedule, and dataset you can audit, hand off, or query from the rest of the platform — not chat-only state.

Intelligence at every stage

Every step of your data pipeline is enhanced with AI. From ingestion to delivery, Datris makes data engineering accessible through natural language.

🔌
MCP Server
AI Agent Integration
First open-source pipeline with native Model Context Protocol. Agents can register pipelines, upload files, trigger jobs, profile data, and run searches.
AI Data Quality
Plain English Validation
Validate with plain English rules via aiRule. AI evaluates every row using reasoning and domain knowledge — no regex required.
AI Transformations
Natural Language Transforms
Describe row transforms in natural language. Date conversion, categorization, entity extraction — no code needed.
📐
AI Schema Generation
Auto-Config from Any File
Upload any CSV, JSON, or XML and get a complete dataset configuration auto-generated. Skip the boilerplate entirely.
📊
AI Data Profiling
Instant Data Insights
Upload a file and get summary statistics, quality issues, and suggested validation rules — all powered by AI analysis.
🔍
AI Error Explanation
Root Cause Analysis
When jobs fail, AI analyzes the error chain and explains the root cause in plain English. No more digging through stack traces.

Push and pull — one platform, two interfaces

AI agents and humans ingest data through the pipeline, store it across databases and vector stores, and retrieve it back — via MCP or API.

Sources
Push
Data Upload
MinIO
Kafka
Pull · Taps
API Tap
DB Tap
File Tap
Custom Script
Processing
1 Preprocessor
2 Data Quality (AI)
3 Transformation (AI)
Storage
MinIO (Parquet/ORC)
PostgreSQL
MongoDB
Kafka
ActiveMQ
REST API
Qdrant
Weaviate
Milvus
Chroma
pgvector
Notification
ActiveMQ
Push
Create Pipelines
Create & Schedule Taps
Upload Data
Trigger Processing
Configure Pipelines
MCP stdio · SSE
Claude Cursor OpenClaw Any MCP Agent
Pull
Query PostgreSQL
Query MongoDB
Semantic Search (Vector DB)
Monitor Jobs
Profile Data
Retrieve Results

Full RAG pipeline built in

Extract, chunk, embed, and upsert documents into any major vector database. Build retrieval-augmented generation workflows without leaving your pipeline.

✂️
Chunking Strategies
Choose the right chunking strategy for your use case:
Fixed-sizeSentenceParagraphRecursive
🧠
Embedding Providers
Generate embeddings with cloud or local models:
OpenAIOllama (local)
📄
Document Extraction
Extract text from virtually any document format:
PDFWordPowerPointExcelHTMLEmailEPUBPlain Text
RAG Pipeline Flow
1 Document Extraction
2 Chunking
3 Embeddings
4 Vector Upsert

Your AI agents are
first-class pipeline operators

Datris ships with a native MCP server. Claude, Cursor, OpenClaw, and any MCP-compatible AI agent can register pipelines, trigger jobs, search your data, and monitor pipelines — all through natural conversation.

Transports: stdioSSE (Server-Sent Events)
Compatible agents:
ClaudeCursorOpenClawAny MCP-compatible agent
MCP Capabilities
  • Register pipelines and configure schemas
  • Upload data for processing
  • Trigger and monitor pipeline jobs
  • Profile data and get AI insights
  • Semantic search across vector databases
  • Query PostgreSQL and MongoDB directly
Example prompt
"Ingest sales_q4.csv into the analytics database and validate that 'revenue must be positive and date must be in 2024'."

Speaks every data language

Ingest structured data, unstructured documents, and archives. Output to vector stores, structured stores, or optimized columnar formats.

Format Input Default Destination
CSV SQL DB
JSON NoSQL DB
XML NoSQL DB
Excel (.xlsx) SQL DB
Parquet SQL DB
ORC SQL DB
PDF Vector DB
Word (.docx) Vector DB
PowerPoint (.pptx) Vector DB
HTML Vector DB
Email (.eml) Vector DB
EPUB Vector DB
Archives (.zip, .tar) Unpacked, routed
Plain Text Vector DB

Destinations are fully configurable. Route any format to any target — SQL databases, NoSQL stores, vector databases, REST endpoints, Kafka topics, or ActiveMQ queues.


Your choice of AI model

Use cloud AI from Anthropic or OpenAI, or keep everything local with Ollama. No vendor lock-in — switch providers without changing your pipeline config.


How Datris compares

The only platform combining MCP-native agent access, AI-powered data quality and transformation, multi-destination pipelines, and RAG — in a single open-source package.

Capability Datris AirbyteFivetrandbtPrefectDagsterNiFiMeltano
MCP Server (native) 30+ tools
AI Data Quality
AI Transformation
Data Ingestion
Orchestration Config-driven ~ Limited ~ Limited
Vector DB / RAG 5 DBs
Open Source AGPL-3.0 Core Core
No-Code JSON config ~ UI UI SQL Python Python Visual CLI/YAML
Self-Hosted

Send us a message

Questions, feedback, or just want to chat — we'd love to hear from you.