View on GitHub GitHub stars GitHub forks

Agent description

Agent for ingesting and querying structured and unstructured content. Accepts PDF, CSV, JSON, and TXT files, extracts text and metadata, normalizes the content, and imports it into a semantic graph to enable complex queries over informational links.

Key features

  • Supported formats: PDF (with optional OCR), CSV, JSON, TXT.
  • Extraction: parsing of tabular structures, extraction of text and metadata (author, timestamp, column names, etc.).
  • Cleaning and normalization: noise removal, tokenization, segmentation into semantic chunks.
  • Enrichment: extraction of entities, relationships, and semantic annotations to create nodes and edges.
  • Graph ingestion: mapping entities to nodes, creating semantic edges, and storing them in the graph backend.
  • Graph queries: searches by entity, paths between nodes, pattern matching and aggregations; support for textual and structured queries.
  • Traceability: preserving references to original documents and text blocks to reconstruct context.

Benefits

  • Transforms diverse document formats into a queryable graph representation.
  • Enables semantic analysis and discovery of hidden relationships across heterogeneous data.
  • Eases integration with existing pipelines (e.g., kgrag-store, MCP server) for advanced search and reasoning.

Example workflow

  1. File reception → 2. Extraction and normalization → 3. Entity/relationship extraction → 4. Graph ingestion → 5. Execute queries and retrieve results

Dependencies

Development

A2A

Docker