Cosdata Roadmap
Detailed roadmap and feature status for Cosdata
Table of Contents
- 1. High-Level Milestones
- 2. Feature Status
- 2.1. Indexing and Search
- 2.2. Distance Metrics and Quantization
- 2.3. Storage and Performance
- 2.4. Data Management and Versioning
- 2.5. Query and API
- 2.6. Graph Database and Knowledge Graph
- 2.7. Cloud Integration and Web Application
- 2.8. Integration and Ecosystem
- 2.9. Security and Access Control
- 2.10. Ongoing Improvements
1. High-Level Milestones
1.1. Vector Database (Dense / Sparse Vectors with Hybrid Search)
1.1.1. MVP/Alpha: December 15, 2024
- Optimized HNSW (dense) and Inverted index (sparse) implementations
- RESTful API for core operations
- SIMD optimized major distance metrics and quantization
- Versioning & "Transaction-as-a-resource"
- Run basic comparison benchmarks for HNSW & Inverted index (SPLADE)
1.2. Graph Database Features, Knowledge Graph Integration
1.2.1. MVP/Alpha: Jan 30, 2025
- Basic graph data structures and operations
- Simple integration with vector search
- Rudimentary CosQL for graph queries
1.2.2. Beta: March 15, 2025
- Advanced graph algorithms and knowledge graph features
- Enhanced CosQL with graph-specific operations
- Basic rule evaluation engine
1.2.3. RC/GA: June 15, 2025
- Full graph database capabilities
- Seamless integration of graph and vector search
- Advanced knowledge graph operations and querying
1.3. Cloud Services and Web Platform
1.3.1. MVP/Alpha: January 15, 2025
- Basic containerization and deployment scripts
- Simple auto-scaling and monitoring
- Prototype of web-based management interface
1.3.2. Beta: June 15, 2025
- Multi-cloud support and improved resource management
- Enhanced monitoring and basic serverless functions
- Development of comprehensive web application
- Initial integration with major cloud ecosystems
1.3.3. RC/GA: August 30, 2025
- Fully automated deployment and scaling
- Production-ready web application with full feature set
- Comprehensive management and analytics interface
- High availability and redundancy features
- Complete integration with major cloud ecosystems
2. Feature Status
2.1. Indexing and Search
- HNSW indexing for dense vectors with high dimensionality support [COMPLETED] [MVP/ALPHA]
- Inverted Index for sparse vectors (Splade & BM25), supporting very high dimensionality [COMPLETED] [MVP/ALPHA]
- ANN probabilistic search for Inverted Index [COMPLETED] [MVP/ALPHA]
- Benchmarking Inverted Index against proprietary data type offerings [IN PROGRESS] [BETA]
- Optimized hybrid search algorithms [TODO] [MVP/ALPHA]
- Advanced indexing optimizations [TODO] [BETA]
- Complete end-to-end comparison benchmarking of HNSW & Inverted Index [TODO] [BETA]
- Implement re-ranker integration [TODO] [RC/GA]
2.2. Distance Metrics and Quantization
- Dot product [COMPLETED] [MVP/ALPHA]
- Cosine Similarity [COMPLETED] [MVP/ALPHA]
- Euclidean [COMPLETED] [MVP/ALPHA]
- Hamming [TODO] [MVP/ALPHA]
- SIMD optimizations for cosine & dot product metrics [COMPLETED] [MVP/ALPHA]
- Binary (base 2) quantization [COMPLETED] [MVP/ALPHA]
- Quaternary (base 4) quantization [COMPLETED] [MVP/ALPHA]
- Octal (base 8) quantization [COMPLETED] [MVP/ALPHA]
- U8 (base 256) quantization [COMPLETED] [MVP/ALPHA]
- Sub-Byte Quantization of Inverted Index [IN PROGRESS] [BETA]
- SIMD optimizations for all quantization methods [IN PROGRESS] [RC/GA]
- Implementing auto-configuration for optimal quantization and storage based on statistical sampling [IN PROGRESS] [BETA]
2.3. Storage and Performance
- Buffered IO, equivalent to memory mapped files for efficient caching [COMPLETED] [MVP/ALPHA]
- Custom storage layer with serialization of index and corresponding file formats [COMPLETED] [MVP/ALPHA]
- Lazy Loading of index nodes, fulfilling DiskANN requirements for low memory use [COMPLETED] [MVP/ALPHA]
- LRU cache for lazy loaded items [COMPLETED] [MVP/ALPHA]
- Separation of compute & storage architecture [COMPLETED] [MVP/ALPHA]
- Advanced caching strategies [TODO] [BETA]
- Distributed storage support [TODO] [RC/GA]
- Implement advanced sharding for multi-billion scale datasets [TODO] [RC/GA]
- Enhance high availability and redundancy features [TODO] [RC/GA]
2.4. Data Management and Versioning
- Versioning with transaction-based historical revisions and branching [COMPLETED] [MVP/ALPHA]
- Lazy loadable collections (Set, Map, Vec, Array, EagerLazyLoad, etc) [COMPLETED] [MVP/ALPHA]
- Auto creation of indexes [COMPLETED] [MVP/ALPHA]
- Advanced versioning features, like branching & related APIs [TODO] [BETA]
- Improve usability of versioning system [TODO] [BETA]
- Multi-modal data support [TODO] [RC/GA]
- Add native support for storing documents and multi-modal data types [TODO] [RC/GA]
2.5. Query and API
- RESTful API (upsert, ANN, collection create, create index) [COMPLETED] [MVP/ALPHA]
- Developing user-facing RESTful API for Inverted Index [IN PROGRESS] [BETA]
- Integrating HNSW hyperparameters API [IN PROGRESS] [BETA]
- GraphQL API support [TODO] [RC/GA]
- Implement metadata filtering [TODO] [BETA]
2.6. Graph Database and Knowledge Graph
- Cos Query Language (CosQL) specification [COMPLETED] [MVP/ALPHA]
- Rule, Fact, Schema parser for data definition, manipulation & querying [COMPLETED] [MVP/ALPHA]
- Rule evaluation engine (detailed design document created) [COMPLETED] [MVP/ALPHA]
- Enhanced CosQL features [TODO] [BETA]
- Enhance graph database rule evaluation engine and improve performance [TODO] [BETA]
- Integrate LLM/model for natural language querying of knowledge graphs and relational data [TODO] [RC/GA]
- Implement Agentic Memory capabilities [TODO] [RC/GA]
2.7. Cloud Integration and Web Application
- Prototype web-based management interface [TODO] [MVP/ALPHA]
- Begin development of comprehensive web application [TODO] [BETA]
- Implement basic serverless functions [TODO] [BETA]
- Integrate with major cloud ecosystems (initial) [TODO] [BETA]
- Release production-ready web application [TODO] [RC/GA]
- Implement advanced serverless functions [TODO] [RC/GA]
- Fully integrate with major cloud ecosystems [TODO] [RC/GA]
- Develop comprehensive analytics features in web application [TODO] [RC/GA]
2.8. Integration and Ecosystem
- Integrate with major text and image vectorization models [TODO] [RC/GA]
- Integrate with LangChain, LlamaIndex, and similar frameworks [TODO] [RC/GA]
- Develop web application and cloud serverless integration with major ecosystems [TODO] [RC/GA]
2.9. Security and Access Control
- Develop authentication and IAM user roles for filtering/joining HNSW and Inverted indexes [TODO] [RC/GA]
2.10. Ongoing Improvements
- Ongoing bug fixes and performance improvements [IN PROGRESS] [ALL PHASES]