Star us on GitHub Open Source ⏱ First pipeline in ~60 seconds

The First Agent-Native
Data Platform

The ingestion and data-quality layer that feeds the stores you already run — driven by AI agents over MCP, developers, and analysts. Datris sits beside your warehouse and your lake. It doesn't replace them.

Open Source on GitHub Watch the Assistant do it

Datris Assistant online

I'm looking for corporate earnings data.

Happy to set that up. Before I build anything, two quick questions so I don't guess:

— Source: SEC EDGAR (free, official), Polygon.io, or Alpha Vantage?

— Scope: which tickers, and how far back?

SEC EDGAR, Mag 7, last 2 years.

▸🔑Inspecting secret sec-edgar✓

▸✨Creating pipeline corporate-earnings → mongodb✓

▸✨Creating tap corporate-earnings✓

Test run looks good — flat fact records flowing across all 7 tickers. Running the real load now.

▸▶Running tap corporate-earnings✓

Done. 24,217 fact records for the Mag 7 loaded into datris.corporate-earnings in ~4 seconds.

One thing to flag for the recurring runs: each weekly run currently appends all matching facts. If you'd rather upsert, I can set keyFields on the pipeline.

→ Open tap corporate-earnings

Describe the data you want to ingest… Send

Built-in Assistant

Type what you want.
The Assistant builds it.

Datris ships with an Assistant Agent inside the platform UI. Tell it what data you want. It asks a few sharp scoping questions, picks the right source and destination, generates the fetcher, requests credentials securely, runs the job, confirms the rows landed, and lets you query the result — usually in seconds, with no hand-written config.

Clarifies scope before building — picks the right source, destination, and cadence with you
Generates and runs the fetcher (tap) for external APIs and files
Requests credentials through a secure form — never in chat history
Polls job status and confirms rows actually landed before saying "done"
Flags things you didn't ask about — upsert vs append, fair-use policies, schema drift
Answers natural-language questions over the data once it's in

Watch the Assistant build a pipeline end-to-end

Things you can say to it

"I'm looking for corporate earnings data."
"Ingest these PDFs into a vector store for RAG."
"Refresh treasury yields from FRED nightly and let me query trends."

Every request becomes a durable pipeline, schedule, and dataset you can audit, hand off, or query from the rest of the platform — not chat-only state.

AI-Powered

Intelligence at every stage

Every step of your data pipeline is enhanced with AI. From ingestion to delivery, Datris makes data engineering accessible through natural language.

🔌

MCP Server

AI Agent Integration

First open-source pipeline with native Model Context Protocol. Agents can register pipelines, upload files, trigger jobs, profile data, and run searches.

✓

AI Data Quality

Plain English Validation

Validate with plain English rules via aiRule. AI evaluates every row using reasoning and domain knowledge — no regex required.

⚡

AI Transformations

Natural Language Transforms

Describe row transforms in natural language. Date conversion, categorization, entity extraction — no code needed.

📐

AI Schema Generation

Auto-Config from Any File

Upload any CSV, JSON, or XML and get a complete dataset configuration auto-generated. Skip the boilerplate entirely.

📊

AI Data Profiling

Instant Data Insights

Upload a file and get summary statistics, quality issues, and suggested validation rules — all powered by AI analysis.

🔍

AI Error Explanation

Root Cause Analysis

When jobs fail, AI analyzes the error chain and explains the root cause in plain English. No more digging through stack traces.

Architecture

Push and pull — one platform, two interfaces

Datris owns data acquisition, validation, normalization, storage, and observability. Your agents focus on reasoning and decisions — not on solving integration problems for the hundredth time. The boundary gets more valuable with every new source and every new agent.

Ingestion Pipeline

Sources

Push · Real-time

Data Upload

MinIO (events)

Kafka (streaming)

Pull · Taps

API Tap

DB Tap

File Tap

Document Tap

Custom Script

Processing

1 Preprocessor

2 Data Quality (AI)

3 Transformation (AI)

4 Extract / Chunk / Embed (Docs)

Storage

MinIO (Parquet/ORC)

S3 (Parquet/ORC)

Snowflake

Databricks

PostgreSQL

MongoDB

Kafka

ActiveMQ

REST API

Qdrant

Weaviate

Milvus

Chroma

pgvector

Notification

ActiveMQ

MCP — AI Agent Interface

Push

Create Pipelines

Create & Schedule Taps

Upload Data

Trigger Processing

Configure Pipelines

MCP stdio · SSE

Claude Cursor OpenClaw Any MCP Agent

Pull

Query PostgreSQL

Query MongoDB

Query Snowflake

Query Databricks

Semantic Search (Vector DB)

Generate Schemas

Monitor Jobs

Profile Data

Retrieve Results

Infrastructure

Self-host on open source

Prefer to run it yourself? Datris is fully open source. Built on proven infrastructure — no proprietary services, no vendor lock-in, no surprise bills.

 $ git clone https://github.com/datris/datris-platform-oss.git
 $ cp .env.example .env
 # Add your API key (at least one required for AI features)
 $ docker compose up -d
 $ curl http://localhost:8080/api/v1/version

Clone. Configure. Launch. Your full pipeline in under a minute.

RAG Pipeline

Full RAG pipeline built in

Extract, chunk, embed, and upsert documents into any major vector database. Build retrieval-augmented generation workflows without leaving your pipeline.

🗄️

5 Vector Databases

Qdrant Weaviate Milvus Chroma pgvector

✂️

Chunking Strategies

Choose the right chunking strategy for your use case:

🧠

Embedding Providers

Generate embeddings with cloud or local models:

📄

Document Extraction

Extract text from virtually any document format:

📒

Document Ledger

Content-hashed ledger of every document seen. Re-runs skip unchanged files automatically — no re-embedding the same folder every night, no surprise OpenAI bill.

RAG Pipeline Flow

1 Document Extraction

→

2 Chunking

→

3 Embeddings

→

4 Vector Upsert

Model Context Protocol

Your AI agents are
first-class pipeline operators

Datris ships with a native MCP server. Claude, Cursor, OpenClaw, and any MCP-compatible AI agent can register pipelines, trigger jobs, and query your structured, document, and vector data in real time — all through natural conversation.

Transports: stdioSSE (Server-Sent Events)

Compatible agents:

MCP Capabilities

Register pipelines and generate schemas from sample data
Create, schedule, and run AI-generated taps
Ingest documents into vector databases (extract → chunk → embed)
Upload data for processing
Trigger and monitor pipeline jobs
Profile data and get AI insights
Semantic search across vector databases
Query PostgreSQL and MongoDB directly
Manage credentials via Vault — without ever holding the key

Example prompt

"Generate a tap for our prime broker margin API, schedule it daily, validate that 'all account_ids must be present and balances must be non-negative', and load into Postgres."

Formats

Speaks every data language

Ingest structured data, unstructured documents, and archives. Output to vector stores, structured stores, or optimized columnar formats.

Format	Input	Default Destination
CSV		SQL DB
JSON		NoSQL DB
XML		NoSQL DB
Excel (.xlsx)		SQL DB
Parquet		SQL DB
ORC		SQL DB
PDF		Vector DB
Word (.docx)		Vector DB
PowerPoint (.pptx)		Vector DB
HTML		Vector DB
Email (.eml)		Vector DB
EPUB		Vector DB
Archives (.zip, .tar)		Unpacked, routed
Plain Text		Vector DB

Destinations are fully configurable. Route any format to any target — SQL databases, NoSQL stores, vector databases, REST endpoints, Kafka topics, or ActiveMQ queues.

AI Providers

Your choice of AI model

Use cloud AI from Anthropic or OpenAI, or keep everything local with Ollama. No vendor lock-in — switch providers without changing your pipeline config.

Anthropic Claude

Claude Opus, Sonnet, Haiku, and more

OpenAI

GPT, o-series, and embedding models

⊙

Ollama

Run local models — Llama, Mistral, Phi, and more

Comparison

How Datris compares

The only platform combining MCP-native agent access, AI-generated taps, AI-powered schema, data quality, and transformation, multi-destination pipelines, and document RAG — in a single open-source package.

Capability	Datris	Airbyte	Fivetran	dbt	Prefect	Dagster	NiFi	Meltano
MCP Server (native)	40+ tools
AI-Generated Taps	Python fetchers
AI Schema Generation	From file
AI Data Quality	Plain-English
AI Transformation	Plain-English
Data Ingestion
Multi-Destination Pipelines	Parallel writes				~ DIY	~ DIY
Document RAG Pipeline	Extract → embed
Vector DB Destinations	5 DBs	~ Few
Vault Secrets	Built-in	~ Add-on	~		~ Add-on	~ Add-on	~
Orchestration	Config-driven	~ Limited	~ Limited
Open Source	AGPL-3.0	Core		Core
No-Code	JSON / MCP / UI	~ UI	UI	SQL	Python	Python	Visual	CLI/YAML
Self-Hosted

Get Started

Connect your agent in 60 seconds

Get an API key and dedicated MCP endpoint, REST API, and full platform UI instantly.

Get an API key Read the Docs

Contact

Send us a message

Questions, feedback, or just want to chat — we'd love to hear from you.

The First Agent-NativeData Platform

Type what you want. The Assistant builds it.

Intelligence at every stage

Push and pull — one platform, two interfaces

Self-host on open source

Full RAG pipeline built in

Your AI agents are first-class pipeline operators

Speaks every data language

Your choice of AI model

How Datris compares

Connect your agent in 60 seconds

Send us a message

The First Agent-Native
Data Platform

Type what you want.
The Assistant builds it.

Your AI agents are
first-class pipeline operators