Azure Cosmos DB Graph API: Modeling Connected Data

While Cosmos DB is known for its document (SQL) API, its Graph API powered by Apache TinkerPop and Gremlin query language is incredibly powerful for connected data. If your data is about relationships – social networks, recommendations, knowledge graphs – the Graph API might be the right choice.

When to Use Graph

Graph databases excel when:

  • Relationships are as important as the data itself
  • You need to traverse connections of arbitrary depth
  • Queries like “friends of friends who like X” are common
  • Your domain is naturally a network: social, fraud detection, knowledge

They’re not ideal for simple CRUD operations or when you always query by a single key.

Graph Concepts

  • Vertices (Nodes): Entities like Person, Product, Location
  • Edges (Relationships): Connections like “knows”, “purchased”, “located_in”
  • Properties: Key-value attributes on vertices and edges

Gremlin Query Examples

// Add a vertex (person)
g.addV('person')
 .property('id', 'john')
 .property('name', 'John')
 .property('age', 35)

// Add another vertex
g.addV('person')
 .property('id', 'jane')
 .property('name', 'Jane')

// Create relationship
g.V('john').addE('knows').to(g.V('jane'))
 .property('since', 2020)

// Find John's friends
g.V('john').out('knows').values('name')
// Result: ["Jane"]

// Find friends of friends
g.V('john').out('knows').out('knows').values('name')

// Recommendation: products bought by people who bought same products as John
g.V('john').out('purchased')
 .in('purchased').where(neq('john'))
 .out('purchased')
 .where(__.not(__.in('purchased').is('john')))
 .dedup()
 .limit(5)

.NET Client

var gremlinServer = new GremlinServer(
    hostname: "myaccount.gremlin.cosmos.azure.com",
    port: 443,
    username: "/dbs/graphdb/colls/social",
    password: primaryKey);

using var client = new GremlinClient(gremlinServer);

// Execute query
var results = await client.SubmitAsync<dynamic>(
    "g.V('john').out('knows').values('name')");

foreach (var result in results)
{
    Console.WriteLine(result);
}

Partition Strategy

Partitioning is crucial in Cosmos DB. For graphs, choose a partition key that keeps connected vertices together when possible. Common strategies:

  • Partition by tenant/organization in multi-tenant systems
  • Partition by locale for location-based graphs
  • Use a synthetic partition key combining entity type and region

Key Takeaways

  • Use Graph API when relationships are the core of your domain
  • Gremlin provides powerful traversal queries for connected data
  • Plan partitioning to keep related vertices together
  • Great for social networks, recommendations, and knowledge graphs

References


Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.