Ingest-and-forget breaks ACLs
Files are dumped into a vector index that has no idea who originally had access. The original SharePoint or Drive permissions disappear at ingestion.
Ready to get started?
Deploy sovereign AI on your infrastructure - in weeks, not months.
Platform · Knowledge Engine · Permission-aware retrieval
Connect SharePoint, Drive, Confluence, Salesforce, Jira, GitHub and 50+ more. Documents inherit their source permissions. Every query inherits the user's identity.
Hybrid search. Citations on every answer. Permissions enforced before results leave the index.
74 source connectors Permission-aware retrieval Hybrid search · vector + keyword Citations on every answer
Most RAG systems work like this: ingest everything, embed it, search it. The model returns the most similar chunks. Whether the user was supposed to see them or not. The HR investigation surfaces in a finance VP's chat. The salary spreadsheet appears in the intern's research. The platform shipped a data leak with extra steps.
Files are dumped into a vector index that has no idea who originally had access. The original SharePoint or Drive permissions disappear at ingestion.
Documents change. Tickets close. Wikis get rewritten. Most RAG systems re-index nightly at best. By morning, the agent is citing yesterday's policy as if it were current.
Pure vector search misses keyword matches the user typed verbatim. Pure keyword misses the semantic match. You need both, plus a reranker. Most setups ship with one.
Connect a source once. The engine handles extraction (text, tables, images via OCR), smart chunking, embedding, and indexing - with the permissions intact. Incremental updates keep it live: a doc edited at 09:14 is searchable by 09:16.
Source authenticated
SharePoint Online
via Microsoft Graph · OAuth
● Connected 2h ago
12,847 documents
Text + tables + images
OCR via Tesseract · 2.4GB
● Done · 14 min
847,210 chunks
Semantic windowing
512-token window · 64 overlap
● Done · 6 min
Vectors generated
847,210 vectors
via your selected embedding tier
● Running · 73%
Searchable + secured
Hybrid index
vector + keyword + ACLs
○ Queued
Pure vector search misses the policy ID the user typed verbatim. Pure keyword search misses the synonym. The Knowledge Engine runs both in parallel, fuses the scores, and reranks the top results with a cross-encoder. The result: better recall and better precision than either approach alone.
"What's our policy on remote work in HR-2024-08?"
same query · two strategiesSemantic similarity finds related concepts but missed the specific policy ID the user typed.
Remote Work Best Practices 2023
blog/remote-2023.md
Hybrid Working Guidelines
wiki/hybrid-guide.md
Manager handbook · WFH section
handbook/manager.pdf
Vector + keyword in parallel, fused with RRF, then reranked. The exact policy ID surfaces first.
HR-2024-08 · Remote Work Policy
policies/HR-2024-08.pdf
HR-2024-08 amendment Q3 update
policies/HR-2024-08-amend.pdf
Remote Work Best Practices 2023
blog/remote-2023.md
Six retrieval features that ship by default
Semantic similarity using your chosen embedding model. Finds the meaning regardless of exact wording.
Classical scoring catches the policy ID, the SKU number, the contract clause your user typed verbatim.
Reciprocal Rank Fusion combines vector and keyword scores. Neither approach loses what it found.
The top results are scored by a model that reads query and result together. Precision lifts further.
Tables, charts, and images extracted via OCR are searchable alongside the text. Diagrams cite back to the page.
Every result returns the source document, page number, and section. Agents include them in answers automatically.
Thirteen native connectors maintained by Katonic with permission sync from the source. Thirty-plus more via the open-source adapter. Any REST API through OpenAPI import. Anything else through the connector SDK. Most enterprises connect five to ten on day one.
Workplace
CRM · Support · Operations
Developer · Data
Missing a source? Build a connector with the SDK. The engine handles chunking, embedding, permission sync, and incremental updates - you just supply the document fetch and the permission lookup.
When two people ask the same question, they should get answers grounded in what they're each authorized to read. Permissions sync from your source systems, get attached to chunks at index time, and filter results before they leave the index. No backchannel leak. No "oops, the chatbot revealed something it shouldn't."
"Show me everything you have on the Q3 internal investigation."
two users · two roles · live filterSarah Chen
HR Director · sarah.chen@your-org
3 results returned
Q3-2024 Investigation Summary
0.94SharePoint · /HR/Investigations/Q3
✓ access via: HR group · Legal group
Witness Statement Compilation
0.89Confluence · HR/Cases/2024-Q3
✓ access via: HR group
Outcome Memo · Sept 18, 2024
0.86SharePoint · /HR/Memos
✓ access via: HR Director · CHRO
Miguel Reyes
Finance VP · miguel.r@your-org
0 results returned
No accessible documents found
The user's group memberships do not include the source ACLs on the matching documents.
Permission check:
· Q3-2024 Investigation Summary → requires HR or Legal · ✗
· Witness Statement Compilation → requires HR · ✗
· Outcome Memo · Sept 18, 2024 → requires HR Director or CHRO · ✗
How it works · Four guarantees
Native connectors pull permissions directly from SharePoint, Google Drive, Confluence, Salesforce. Sync runs at minutes-level latency.
The user's identities across sources (SSO, email, source-specific IDs) resolve into a single authorization set per query.
Below 100K permitted documents the filter runs before search. Above that, post-filter with high-throughput data structures keeps latency flat.
If the permission service is unreachable, queries return empty. Not unfiltered with a warning. The default is a security decision, not a configuration suggestion.
WHEN PERMISSIONS CAN'T BE CHECKED
The engine stops.
If the permission service is unreachable for any reason, the engine returns empty search results. It does not fall back to returning unfiltered results with a warning.
This was decided at architecture time, not at configuration time. An empty response is recoverable. Showing a user a document they aren't allowed to see is not.
Connect sources at /knowledge. Search across them with permission-aware filters. Watch quality drift in real time. The chrome below is what your team sees on day one.
Knowledge
74 connectors · 11 categories · permission-aware retrieval · audit on every read
Connected sources
12
of 74 available
Total documents
242,915
indexed across all sources
Sync health
10/12
1 syncing · 1 failed · 1 paused
Avg sync latency
2.4m
incremental update window
Confluence
Knowledge Base
Docs
8,421
Last sync
12m ago
Agents
14
SharePoint
Knowledge Base
Docs
14,826
Last sync
syncing
Agents
23
Notion
Knowledge Base
Docs
3,421
Last sync
8m ago
Agents
8
Google Drive
Cloud Storage
Docs
22,184
Last sync
5m ago
Agents
31
AWS S3
Cloud Storage
Docs
6,892
Last sync
1h ago
Agents
4
Jira
Ticketing
Docs
12,421
Last sync
3m ago
Agents
11
Zendesk
Ticketing
Docs
0
Last sync
auth error
Agents
0
Slack
Messaging
Docs
142,847
Last sync
1m ago
Agents
9
Microsoft Teams
Messaging
Docs
28,412
Last sync
paused
Agents
0
Salesforce
Sales
Docs
4,128
Last sync
14m ago
Agents
6
GitHub
Code
Docs
18,421
Last sync
2m ago
Agents
5
Snowflake
Data Source
Docs
942
Last sync
30m ago
Agents
3
Showing 12 connected sources of 74 available · click any card for the indexing detail view
/knowledge in your sandbox today, including all 74 connectors with their real category groupings, the permission-filtered hybrid search, and the citation drift detector.When you build an agent, point it at the knowledge sources it should use. The agent searches on the user's behalf, with the user's permissions, and includes citations on every grounded claim. No coding required. Toggle on, ship it.
STEP 04 OF 07 · KNOWLEDGE
● 3 BOUNDKnowledge sources
The agent will retrieve from these sources during conversation. Add up to 10.
✓ SharePoint
● synced 2m ago/HR · /HR/Policies · /HR/Investigations
refresh: 60s incremental
✓ Confluence
● synced 1m agoSpace: HR · Space: People Ops
refresh: 60s incremental
✓ Google Drive
● synced 4m agoFolder: HR Public · Folder: Onboarding
refresh: 60s incremental
Use end-user identity
Searches run as the user. The user's source permissions filter results.
Require citations in response
The agent must cite a source on every grounded claim.
Refuse if no relevant context found
If retrieval returns nothing, the agent says so.
WORKROOM · AGENT IN USE
What's our current policy on remote work?
Our current remote work policy is documented in HR-2024-08. Employees can work remotely up to 3 days per week with manager approval, with all hands required in-office on Tuesdays and Thursdays[1].
The September Q3 amendment added a per-team override: managers can require additional in-office days during launch periods[2]. International remote work is permitted up to 30 days per year with HR pre-approval[3].
3 sources cited · all accessible to Sarah
HR-2024-08 · Remote Work Policy
p.3 · Section 2.1 · SharePoint /HR/Policies
HR-2024-08 amendment Q3
p.1 · SharePoint /HR/Policies
HR FAQ · International Remote
updated Sept 2024 · Confluence · HR space
What you get for binding once
The agent decides when to search. No retrieval pipeline to build. No re-ranking logic to maintain. The agent knows it has knowledge sources and reaches for them when the question warrants.
The user's identity flows through to the search. The agent retrieves only what that user is authorized to see. Two users asking the same question get answers from different document sets.
Configure the agent to refuse responses without source citations. Every grounded claim points back to a document, page, and section. Hallucinations have nowhere to hide.
Knowledge binding is one of the seven steps in the agent builder. See the full agent build flow →
How do I keep this from going stale?
Incremental sync detects changes within 60 seconds and re-embeds only what changed. A document edited in SharePoint is searchable two minutes later. Old chunks are removed automatically. No full rebuild needed.
How do you prove the chatbot didn't leak?
Every retrieval is logged with the user, query, returned document IDs, and the permission groups that authorized each. If a regulator asks whether anyone outside the HR group ever surfaced an HR document, you can answer with a single query.
Can I see who saw what?
Yes. Retrieval logs are exportable to your SIEM. Includes user, query, document ID, source path, permission groups checked, and outcome. Maps directly to access-monitoring controls in ISO 27001, HIPAA, and GDPR Article 30.
Why does the agent say it can't find something?
Because the source it would have come from isn't shared with you. The agent will say so directly: "I don't have access to documents that match." If you need access, request it through the normal channels - the source system stays the source of truth.
Works for demos. Fails compliance.
Pull a vector DB and a chunking script off the shelf. Build it yourself. Permissions are a problem you'll solve later. Compliance review will find them.
Works at small scale. Black box.
A SaaS RAG endpoint. Documents land in their multi-tenant index. Permissions are an afterthought you configure in their console. Their tenant boundary is your tenant boundary.
Permission-aware by construction.
50-plus connectors. Permissions sync from source. Hybrid search with reranking. Citations on every answer. Audit-grade retrieval logs. Fail-closed by default. Lives in your boundary.
Every enterprise has the same data leak risk waiting in their AI pilot. The HR file the chatbot wasn't supposed to surface. The salary sheet that ended up in the model's context. RAG that ignores permissions is not RAG that's missing a feature. It's a compliance disaster with extra steps. Build it permission-aware or don't build it.
Sandbox access in 24 hours. Comes pre-loaded with sample SharePoint and Drive content with realistic permissions, plus two test users so you can see how the same query produces two different result sets.
Bring your own source after that. The first connector takes about 30 minutes.
