Cos Graph Query Language Specification
Table of Contents
1. Schema Definition
1.1. Entity Definition
define entity <entity_name> as <attribute_name>: <data_type> [, <attribute_name>: <data_type>]*;
Example:
define entity person as name: string , age: int , email: string; define entity company as name: string , founded_year: int;
1.2. Relationship Definition
define relationship <relationship_name> as (<role1>: <entity_type1>, <role2>: <entity_type2>[, <role3>: <entity_type3>]*), [attribute1: <data_type1>, attribute2: <data_type2>, ...];
- Roles and their associated entity types are enclosed in parentheses.
- Multiple roles are separated by commas.
- Optional attributes can be defined after the roles, each on a new line.
- The definition ends with a semicolon.
Example (binary relationship):
define relationship works_in as (employee: person, department: department); define relationship manages as (manager: person, department: department); define relationship assigned_to as (employee: person, project: project); // Example with attributes define relationship employment as (employee: person, employer: company), start_date: date, salary: double;
Example (primitive ternary relationship)
// can/t be decomposed into binaries define relationship contributes_research as (author: person, research_entity: research_entity, // survey, workshop, talk domain: domain), date: date; //in-exhaustive attrs
1.2.1. Note: Relationship Definitions (Schema Level):
- These are like blueprints or templates for relationships.
- They define the structure: what types of entities can participate and what attributes the relationship can have.
- They're part of the schema, similar to how we define entity types.
2. Data Manipulation
2.1. Entity Instance Insertion
insert $<variable_name> isa <entity_type> ( <attribute_name>: <value> [, <attribute_name>: <value>]* );
Example:
insert $john isa person ( name: "John Doe" , age: 30 , email: "john@example.com" ); insert $techcorp isa company ( name: "TechCorp" , founded_year: 2000 );
2.2. Relationship Instance Insertion
insert $<variable_name> ( <role_name>: $<entity_variable> [, <role_name>: $<entity_variable>]* ) forms <relationship_type> ( <attribute_name>: <value> [, <attribute_name>: <value>]* );
Example (binary relationship):
insert $job1 ( employee: $john, employer: $techcorp ) forms employment ( salary: 100000.00 , start_date: 2022-03-01 );
Example (ternary relationship):
insert $assignment1 ( employee: $john, project: $ai_project, department: $tech_dept ) forms project_assignment ( role: "Lead Developer" , start_date: 2023-01-20 );
2.2.1. Note: Relationship Instances (Data Level):
- These are actual connections between specific entities in your data.
- They can be created directly (like when inserting data) or inferred by rules.
// Relationship Definition (Schema Level) define relationship employment as (employee: person, employer: company), start_date: date, salary: double; // Direct Insertion of a Relationship Instance (Data Level) insert ($john, $techcorp) forms employment ( start_date: 2023-01-15, salary: 75000.00 ); // Rule that Infers a Relationship Instance (Data Level) define rule infer_management as match $dept isa department ( name: $dept_name, head: $manager_name ), $employee isa person ( name: $manager_name ), ($employee, $dept) forms works_in infer derive ($employee, $dept) forms manages;
3. Querying
3.1. Basic Query Structure
match $<variable> isa <entity_type> ( <attribute_name>: <value_or_variable> [, <attribute_name>: <value_or_variable>]* ) [, $<relationship_variable> ( <role_name>: $<entity_variable> [, <role_name>: $<entity_variable>]* ) forms <relationship_type> ( <attribute_name>: <value_or_variable> [, <attribute_name>: <value_or_variable>]* )]* get $<variable> [, $<variable>]*;
Example (querying a ternary relationship):
match $employee isa person ( name: $name ), $project isa project ( name: "AI Initiative" ), $assignment ( employee: $employee, project: $project, department: $dept ) forms project_assignment ( start_date: $start_date ), $dept isa department ( name: "Tech Department" ) get $name, $start_date;
4. Rules
Rules allow us to define new relationships or entities based on existing ones. They are similar to views in relational databases or derived predicates in Datalog.
4.1. Rule Definition
define rule <rule_name> as match <pattern1>, <pattern2>, ... infer <conclusion>;
4.2. Example: Transitive Closure of Flight Connections
Let's consider a scenario where we have direct flights between cities, and we want to find all reachable destinations, including those requiring multiple flights.
// Base case: A city is reachable if there's a direct flight define rule reachable_direct as match (from: $city1, to: $city2) forms direct_flight infer materialize (from: $city1, to: $city2) forms reachable; // Recursive case: A city is reachable if we can reach an intermediate city define rule reachable_indirect as match (from: $city1, to: $intermediate) forms reachable, (from: $intermediate, to: $city2) forms reachable, $city1 != $city2 // Prevent trivial cycles infer materialize (from: $city1, to: $city2) forms reachable;
These rules define a new `reachable` relationship:
- The first rule establishes that any direct flight makes the destination reachable.
- The second rule recursively defines that if we can reach an intermediate city, and from there reach a final destination, then that final destination is reachable from the starting city.
We can then use this in queries:
match $start isa city ( name: "New York" ), $end isa city ( name: $destination ), (from: $start, to: $end) forms reachable get $destination;
This query would return all cities reachable from New York, whether by direct flights or any number of connections.
4.3. Using Rules with Attributes
We can extend this example to include distance:
define relationship direct_flight ( from: city, to: city) ( distance: int ); // Base case with distance define rule reachable_direct as match (from: $city1, to: $city2) forms direct_flight ( distance: $dist ) infer materialize (from: $city1, to: $city2) forms reachable ( distance: $dist ); // Recursive case with distance define rule reachable_indirect as match (from: $city1, to: $intermediate) forms reachable ( distance: $dist1 ), (from: $intermediate, to: $city2) forms reachable ( distance: $dist2 ), $city1 != $city2 infer materialize (from: $city1, to: $city2) forms reachable ( distance: ($dist1 + $dist2) );
Now we can query for reachable cities within a certain distance:
match $start isa city ( name: "New York" ), $end isa city ( name: $destination ), (from: $start, to: $end) forms reachable ( distance: $dist ), $dist < 5000 get $destination, $dist;
This query would return all cities reachable from New York within a distance of 5000 units, along with the total distance.
4.3.1. Entity and Relationship Usage in Rules
In the match and infer clauses:
- For entities, use
isa
:$variable isa entity_type ( attribute: $value, [attribute: $value]* )
- For relationships, use
forms
:($variable1, $variable2) forms relationship_type
This distinction clarifies when we're dealing with entities (isa) versus relationships (forms) in our rules.
define rule infer_close_collaboration as match $employee1 isa person ( name: $name1 ), $employee2 isa person ( name: $name2 ), $project isa project ( name: $project_name ), ($employee1, $project) forms assigned_to, ($employee2, $project) forms assigned_to, $employee1 != $employee2 infer derive ($employee1, $employee2) forms close_collaborator ( project: $project_name );
5. Inferred Relationships:
By default, the system will determine whether to materialize inferred relationships or compute them on-demand based on internal heuristics. Users can override this behavior by specifying 'materialize' or 'derive' in the 'infer' clause of a rule. Materialized inferences will be explicitly stored and updated when relevant base data changes. Derived inferences will be computed when queried.
define rule colleagues as match $emp1 (employee: $person1, employer: $company) isa employment, $emp2 (employee: $person2, employer: $company) isa employment, $person1 != $person2 infer [materialize | derive] ($person1, $person2) isa colleague;
6. Inference Syntax for Schema Evolution:
To add new attributes to existing entities in a rule's inference, use the 'extend' clause followed by the entity variable:
infer [derive | materialize] extend $entity_var ( attribute_name: attribute_value [, attribute_name: attribute_value]* );
derive
in our language means:
The dynamic, on-demand computation of data. This computed data is not stored persistently but generated when needed. It can involve:
- Adding new attributes to existing entities
- Creating entirely new entities or relationships based on existing data
- Transforming existing data into new forms
define rule fahrenheit_to_celsius as match $temp_f isa temperature ( value: $fahrenheit, unit: "Fahrenheit" ) compute $celsius = ($fahrenheit - 32) * 5 / 9 infer derive $temp_f ( celsius_value: $celsius );
// Adding multiple attributes to an existing entity define rule enrich_person_data as match $person isa person ( name: $name, birth_year: $year ) compute $age = current_year() - $year, $generation = categorize_generation($year) infer derive $person ( age: $age, generation: $generation ); // Creating a new entity define rule create_celsius_reading as match $temp_f isa temperature ( value: $fahrenheit, unit: "Fahrenheit", timestamp: $time ) compute $celsius = ($fahrenheit - 32) * 5 / 9 infer materialize $temp_c isa temperature ( value: $celsius, unit: "Celsius", timestamp: $time, original_reading: $temp_f );
Inference Syntax for Entity and Relationship Creation/Derivation:
To create/derive a new entity in a rule's inference:
infer [derive | materialize] $new_entity isa entity_type ( attribute_name: attribute_value [, attribute_name: attribute_value]* );
To create/derive a new relationship in a rule's inference:
infer [derive | materialize] ($role1: $entity1, $role2: $entity2 [, $role3: $entity3]*) forms relationship_type ( attribute_name: attribute_value [, attribute_name: attribute_value]* );
'derive' indicates that the inferred entity or relationship is computed on-demand. 'materialize' indicates that the inferred entity or relationship is stored persistently.
define rule celsius_conversion as match $temp_f isa temperature ( value: $fahrenheit, unit: "Fahrenheit" ) compute $celsius = ($fahrenheit - 32) * 5 / 9 infer derive $temp_f ( celsius_value: $celsius ); define rule create_friendship as match $person1 isa person ( name: $name1 ), $person2 isa person ( name: $name2 ), (actor: $person1, target: $person2) forms social_interaction ( count: $count ) compute $friendship_strength = calculate_strength($count) infer materialize ($friend1: $person1, $friend2: $person2) forms friendship ( strength: $friendship_strength, formed_at: current_timestamp() );
7. Joins
7.1. Entity and Relationship Definitions
Let's start by defining our entities and relationships:
define entity person as name: string, email: string; define entity department as name: string, budget: double; define entity project as name: string, start_date: date, end_date: date; define relationship works_in as employee: person, department: department; define relationship manages as manager: person, department: department; define relationship assigned_to as employee: person, project: project;
7.2. Complex Query Example
Here's a query that demonstrates joining across multiple entities:
7.2.1. Query Description
Find all managers who work in departments with a budget over 1 million and are also assigned to projects ending in 2023, along with their department names and project names.
7.2.2. Query Syntax
match $manager isa person ( name: $manager_name, email: $manager_email ), $department isa department ( name: $dept_name, budget: $budget ), $project isa project ( name: $project_name, end_date: $end_date ), (manager: $manager, department: $department) forms manages, (employee: $manager, department: $department) forms works_in, (employee: $manager, project: $project) forms assigned_to, $budget > 1000000, $end_date >= date("2023-01-01") and $end_date <= date("2023-12-31") get $manager_name, $manager_email, $dept_name, $project_name;
7.2.3. Rule Definition Using the Complex Query
Here's how you might use this query in a rule:
define rule find_high_budget_managers_on_2023_projects as match $manager isa person ( name: $manager_name, email: $manager_email ), $department isa department ( name: $dept_name, budget: $budget ), $project isa project ( name: $project_name, end_date: $end_date ), (manager: $manager, department: $department) forms manages, (employee: $manager, department: $department) forms works_in, (employee: $manager, project: $project) forms assigned_to, $budget > 1000000, $end_date >= date("2023-01-01") and $end_date <= date("2023-12-31") infer derive $result isa manager_project_summary ( manager_name: $manager_name, manager_email: $manager_email, department_name: $dept_name, project_name: $project_name );
This rule creates derived manager_project_summary
entities based on the complex join across person, department, and project entities.
8. Query vs Rule
In our knowledge graph query language, there are two primary approaches to working with data:
- Direct Querying (Match/Get)
- Rule-Based Inference (Rule/Match/Infer)
This document explains the differences, use cases, and examples of each approach.
8.1. Match/Get (Direct Querying)
8.1.1. Syntax
match // pattern matching get // variables to retrieve
8.1.2. Purpose
- Used for direct, one-time queries to retrieve existing data.
8.1.3. Characteristics
- Executed immediately when you run the query.
- Results are returned to the user but not stored in the database.
- Typically used for simpler, direct data retrieval.
- Each query is standalone.
8.1.4. When to Use
- For ad-hoc queries where you need immediate results.
- When exploring data or testing hypotheses.
- For simple data retrieval that doesn't require complex inference.
- When you don't need to persist the results or reuse the query logic.
8.1.5. Example
match $person isa person ( name: $name, age: $age ), $age > 30 get $name, $age;
8.2. Rule/Match/Infer (Rule-Based Inference)
8.2.1. Syntax
define rule rule_name as match // pattern matching infer [derive | materialize] // new data to infer
8.2.2. Purpose
- Used to define reusable patterns for inferring new data or relationships.
8.2.3. Characteristics
- Defined once, then automatically applied whenever relevant data changes or when explicitly invoked.
- Results can be derived on-demand (with 'derive') or stored persistently (with 'materialize').
- Can encapsulate more complex logic and multi-step inferences.
- Rules can be reused across different contexts and combined with other rules.
8.2.4. When to Use
- For complex inferences that you want to automate and reuse.
- When you need to derive new data based on existing data.
- For maintaining derived properties that should be updated whenever base data changes.
- When implementing business logic that should be consistently applied across your database.
8.2.5. Example
define rule categorize_senior_employees as match $employee isa person ( name: $name, hire_date: $hire_date ), $years_employed = years_between($hire_date, current_date()), $years_employed >= 10 infer derive $employee ( employee_category: "Senior" );
9. Key Differences
Aspect | Match/Get (Direct Querying) | Rule/Match/Infer (Rule-Based Inference) |
---|---|---|
Execution | Immediate | Defined once, applied automatically or manually |
Result Persistence | Not stored | Can be derived or materialized |
Complexity | Typically simpler | Can handle more complex logic |
Reusability | Standalone queries | Reusable across contexts |
Use Case | Ad-hoc data retrieval and analysis | Implementing persistent business logic |
10. Custom or System Procedure (cosproc)
The cosproc keyword is used to invoke built-in or user-defined functions within rules. These functions can perform complex operations, including inferring relationships and modifying the knowledge graph.
Syntax: Either all unnamed:
cos_proc function_name(arg1, arg2, arg3, ...)
Or all named:
cos_proc function_name(param1: value1, param2: value2, param3: value3, ...)
- functionname: Name of the built-in or user-defined function
- In the unnamed style, the order of arguments is significant
- In the named style, the order of arguments is not significant
Mixing named and unnamed parameters in the same function call is not allowed.
define rule infer_collaboration as match $emp1 isa employee, $emp2 isa employee, $emp1 != $emp2 cos_proc calculate_collaboration($emp1, $emp2, global.projects, "last_6_months", 0.7)
define rule infer_proximity as match $person1 isa person ( location: $loc1 ), $person2 isa person ( location: $loc2 ), $person1 != $person2 cos_proc proximity_check(person1: $person1, person2: $person2, loc1: $loc1, loc2: $loc2, max_distance: 5.0)
11. Data Types
string
: Text dataint
: Integer numbersdouble
: Floating-point numbersdate
: Date in the format YYYY-MM-DDboolean
: True or false values
12. Comments
Single-line comments start with //
:
// This is a single-line comment
Multi-line comments are enclosed in /*
and */
:
/* This is a multi-line comment */