Dotted TriangleDotted Triangle
Triangle SVG
logo

Cosdata Roadmap

Detailed roadmap and feature status for Cosdata

Table of Contents

1. High-Level Milestones

1.1. Vector Database (Dense / Sparse Vectors with Hybrid Search)

1.1.1. MVP/Alpha: December 15, 2024

  1. Optimized HNSW (dense) and Inverted index (sparse) implementations
  2. RESTful API for core operations
  3. SIMD optimized major distance metrics and quantization
  4. Versioning & "Transaction-as-a-resource"
  5. Run basic comparison benchmarks for HNSW & Inverted index (SPLADE)

1.2. Graph Database Features, Knowledge Graph Integration

1.2.1. MVP/Alpha: Jan 30, 2025

  1. Basic graph data structures and operations
  2. Simple integration with vector search
  3. Rudimentary CosQL for graph queries

1.2.2. Beta: March 15, 2025

  1. Advanced graph algorithms and knowledge graph features
  2. Enhanced CosQL with graph-specific operations
  3. Basic rule evaluation engine

1.2.3. RC/GA: June 15, 2025

  1. Full graph database capabilities
  2. Seamless integration of graph and vector search
  3. Advanced knowledge graph operations and querying

1.3. Cloud Services and Web Platform

1.3.1. MVP/Alpha: January 15, 2025

  1. Basic containerization and deployment scripts
  2. Simple auto-scaling and monitoring
  3. Prototype of web-based management interface

1.3.2. Beta: June 15, 2025

  1. Multi-cloud support and improved resource management
  2. Enhanced monitoring and basic serverless functions
  3. Development of comprehensive web application
  4. Initial integration with major cloud ecosystems

1.3.3. RC/GA: August 30, 2025

  1. Fully automated deployment and scaling
  2. Production-ready web application with full feature set
  3. Comprehensive management and analytics interface
  4. High availability and redundancy features
  5. Complete integration with major cloud ecosystems

2. Feature Status

  • HNSW indexing for dense vectors with high dimensionality support [COMPLETED] [MVP/ALPHA]
  • Inverted Index for sparse vectors (Splade & BM25), supporting very high dimensionality [COMPLETED] [MVP/ALPHA]
  • ANN probabilistic search for Inverted Index [COMPLETED] [MVP/ALPHA]
  • Benchmarking Inverted Index against proprietary data type offerings [IN PROGRESS] [BETA]
  • Optimized hybrid search algorithms [TODO] [MVP/ALPHA]
  • Advanced indexing optimizations [TODO] [BETA]
  • Complete end-to-end comparison benchmarking of HNSW & Inverted Index [TODO] [BETA]
  • Implement re-ranker integration [TODO] [RC/GA]

2.2. Distance Metrics and Quantization

  • Dot product [COMPLETED] [MVP/ALPHA]
  • Cosine Similarity [COMPLETED] [MVP/ALPHA]
  • Euclidean [COMPLETED] [MVP/ALPHA]
  • Hamming [TODO] [MVP/ALPHA]
  • SIMD optimizations for cosine & dot product metrics [COMPLETED] [MVP/ALPHA]
  • Binary (base 2) quantization [COMPLETED] [MVP/ALPHA]
  • Quaternary (base 4) quantization [COMPLETED] [MVP/ALPHA]
  • Octal (base 8) quantization [COMPLETED] [MVP/ALPHA]
  • U8 (base 256) quantization [COMPLETED] [MVP/ALPHA]
  • Sub-Byte Quantization of Inverted Index [IN PROGRESS] [BETA]
  • SIMD optimizations for all quantization methods [IN PROGRESS] [RC/GA]
  • Implementing auto-configuration for optimal quantization and storage based on statistical sampling [IN PROGRESS] [BETA]

2.3. Storage and Performance

  • Buffered IO, equivalent to memory mapped files for efficient caching [COMPLETED] [MVP/ALPHA]
  • Custom storage layer with serialization of index and corresponding file formats [COMPLETED] [MVP/ALPHA]
  • Lazy Loading of index nodes, fulfilling DiskANN requirements for low memory use [COMPLETED] [MVP/ALPHA]
  • LRU cache for lazy loaded items [COMPLETED] [MVP/ALPHA]
  • Separation of compute & storage architecture [COMPLETED] [MVP/ALPHA]
  • Advanced caching strategies [TODO] [BETA]
  • Distributed storage support [TODO] [RC/GA]
  • Implement advanced sharding for multi-billion scale datasets [TODO] [RC/GA]
  • Enhance high availability and redundancy features [TODO] [RC/GA]

2.4. Data Management and Versioning

  • Versioning with transaction-based historical revisions and branching [COMPLETED] [MVP/ALPHA]
  • Lazy loadable collections (Set, Map, Vec, Array, EagerLazyLoad, etc) [COMPLETED] [MVP/ALPHA]
  • Auto creation of indexes [COMPLETED] [MVP/ALPHA]
  • Advanced versioning features, like branching & related APIs [TODO] [BETA]
  • Improve usability of versioning system [TODO] [BETA]
  • Multi-modal data support [TODO] [RC/GA]
  • Add native support for storing documents and multi-modal data types [TODO] [RC/GA]

2.5. Query and API

  • RESTful API (upsert, ANN, collection create, create index) [COMPLETED] [MVP/ALPHA]
  • Developing user-facing RESTful API for Inverted Index [IN PROGRESS] [BETA]
  • Integrating HNSW hyperparameters API [IN PROGRESS] [BETA]
  • GraphQL API support [TODO] [RC/GA]
  • Implement metadata filtering [TODO] [BETA]

2.6. Graph Database and Knowledge Graph

  • Cos Query Language (CosQL) specification [COMPLETED] [MVP/ALPHA]
  • Rule, Fact, Schema parser for data definition, manipulation & querying [COMPLETED] [MVP/ALPHA]
  • Rule evaluation engine (detailed design document created) [COMPLETED] [MVP/ALPHA]
  • Enhanced CosQL features [TODO] [BETA]
  • Enhance graph database rule evaluation engine and improve performance [TODO] [BETA]
  • Integrate LLM/model for natural language querying of knowledge graphs and relational data [TODO] [RC/GA]
  • Implement Agentic Memory capabilities [TODO] [RC/GA]

2.7. Cloud Integration and Web Application

  • Prototype web-based management interface [TODO] [MVP/ALPHA]
  • Begin development of comprehensive web application [TODO] [BETA]
  • Implement basic serverless functions [TODO] [BETA]
  • Integrate with major cloud ecosystems (initial) [TODO] [BETA]
  • Release production-ready web application [TODO] [RC/GA]
  • Implement advanced serverless functions [TODO] [RC/GA]
  • Fully integrate with major cloud ecosystems [TODO] [RC/GA]
  • Develop comprehensive analytics features in web application [TODO] [RC/GA]

2.8. Integration and Ecosystem

  • Integrate with major text and image vectorization models [TODO] [RC/GA]
  • Integrate with LangChain, LlamaIndex, and similar frameworks [TODO] [RC/GA]
  • Develop web application and cloud serverless integration with major ecosystems [TODO] [RC/GA]

2.9. Security and Access Control

  • Develop authentication and IAM user roles for filtering/joining HNSW and Inverted indexes [TODO] [RC/GA]

2.10. Ongoing Improvements

  • Ongoing bug fixes and performance improvements [IN PROGRESS] [ALL PHASES]