How identity resolution works: keys, graphs, and stitching?

October 27, 2025

Todd Gorsuch

Why identity resolution matters for modern CX

Executives tie growth to trusted customer understanding. Identity resolution connects events, profiles, and permissions into an accurate view of a person or account. This capability reduces waste in media, improves service routing, and enables personalisation that respects consent. When identity resolution works, operations answer with confidence. When it fails, channels disagree, journeys break, and risk increases. Cookies, device identifiers, and account IDs now coexist with privacy constraints and channel silos. The discipline needs clear definitions, testable methods, and governance. Identity resolution aligns data management with experience outcomes by creating a shared identity fabric across marketing, service, product, and analytics. This article explains keys, graphs, and stitching. It sets definitions, shows mechanisms, outlines risks, and provides measurement and next steps. It draws on established guidance from standards bodies, research, and platform documentation.¹²³

What is identity resolution in plain terms

Identity resolution is the process that determines whether records refer to the same real-world entity. A record might be a web cookie, a CRM profile, a call transcript, or an event in a log. The discipline combines deterministic rules with probabilistic models to decide when to merge or link. In classic literature this is called record linkage or entity resolution, and it covers matching, deduplication, and consolidation. The field defines standard steps. Data is cleaned and standardized. Candidate pairs are generated. Pairs are scored with comparison functions. Decisions are made with thresholds. Surviving links are persisted for downstream use. This framing lets teams implement the capability in a repeatable way and audit decisions over time. The same approach applies to consumers, businesses, and households. The same approach applies to first party data and to pseudonymous data.⁴⁵

How do identity keys work across channels

Identity keys are stable identifiers that anchor records. Keys include user IDs, account numbers, loyalty IDs, device IDs, UUIDs, and first party cookies. A key may be natural, like an account number created by a core system, or synthetic, like a UUID generated to tag events. HTTP cookies store small key value pairs in the browser and allow a site to recognize a returning user within that domain. Device ID schemes and app instance identifiers serve a similar role in mobile contexts. Google Analytics 4 illustrates an identity hierarchy that combines User ID, Device ID, and signed in signals to stitch sessions when possible. Keys do not guarantee truth. Keys provide anchors that reduce search space and guide matching. Good design treats keys as evidence, not as absolute identity. This approach supports resilience when ecosystems change.²⁶⁷

Where do graphs fit and why they help

Graphs model relationships between identifiers, attributes, and events. Nodes represent entities such as person, device, email, or household. Edges represent relationships such as uses, logs in as, or shares address with. Graph traversal exposes clusters and communities that indicate a single customer behind multiple identifiers. Graphs help in three ways. First, they make relationship evidence explicit and queryable. Second, they support incremental updates that avoid full recompute. Third, they power machine learning features such as triangle counts and connected components that improve match quality. Graph techniques are well studied in data management and have proven effective in entity resolution scenarios. Practitioners can start with a simple bipartite graph of people and identifiers, then evolve to richer schemas as data maturity grows.⁸⁹

What stitching actually does under the hood

Stitching links records that likely belong together and prevents links that likely do not. A practical pipeline applies rules in tiers. Tier one runs deterministic joins on exact keys such as account ID and verified email. Tier two applies fuzzy comparison on names, addresses, and phones using tokenization, phonetics, and edit distance. Tier three introduces probabilistic scoring that weights agreement and disagreement across fields. The classic merge purge problem shows how to scale these comparisons with blocking and indexing. A production system retains a history of merges and splits, supports human adjudication for tough cases, and writes explainable evidence with each link. This makes the graph auditable and reversible. The approach balances recall and precision while preserving customer trust.⁴¹⁰

Which privacy and consent rules shape identity resolution

Privacy engineering and regulation shape how teams design identifiers, storage, and decisions. De identification techniques reduce direct identifiers and control linkage risk. GDPR defines personal data broadly and clarifies that pseudonymous identifiers remain personal when reidentification is reasonably possible. Best practice stores identifiers with purpose limitation and access control, and uses consent status as a join key in activation. The IAB Tech Lab provides practical addressability standards and taxonomies that help align channel integration. Executives should treat identity as part of a broader privacy program with DPIAs, data minimisation, and deletion workflows. This stance protects customers and reduces legal and reputational risk while enabling responsible analytics.¹¹¹²¹³

How to measure match quality and operational impact

Leaders measure both algorithmic quality and business value. Precision measures the share of predicted matches that are correct. Recall measures the share of true matches that the system found. F1 balances the two. Teams create labeled samples for adjudication, then compute these metrics per key population and per channel. Operationally, identity resolution should reduce duplicate communications, increase channel reach on known customers, and shorten handle time in service. It should also improve attribution quality and personalization acceptance. Continuous monitoring matters. Drift in input data or channel changes can reduce quality without obvious symptoms. Clear dashboards and regular review create a feedback loop that maintains trust. This measurement discipline turns identity resolution into a managed product, not a one time project.¹⁴

How do deterministic and probabilistic methods compare

Deterministic matching uses exact agreement on trusted fields. It is easy to explain and fast to compute. It can miss valid links when data quality is imperfect. Probabilistic matching assigns a score based on how discriminative each field is and how often values agree by chance. It recovers links that deterministic rules miss and includes a tunable threshold. It requires tuning and governance to stay credible. Modern systems blend the two. They start with deterministic rules to set anchors. They apply probabilistic scoring to candidate pairs. They enforce safety rails such as do not cross household boundaries without consent. This blended approach delivers performance and transparency. It also creates a natural path for human review at score bands where decisions are ambiguous.⁴⁵

What does a pragmatic target architecture look like

A pragmatic architecture uses a layered structure. The ingestion layer standardizes events and profile data and assigns synthetic UUIDs. The identity layer stores the person identifier graph and the decision history. The decisioning layer exposes APIs for resolve, lookup, and unmerge, and writes explainable evidence. The activation layer uses the resolved IDs with consent flags for orchestration and analytics. The model layer adds features such as similarity scores, connected components, and recency. The governance layer manages policies, audits, and deletion. Google’s documentation on identity spaces shows a simple pattern for combining user and device identifiers. The IAB and CDP community provide practical building blocks and shared language. This structure supports scale and change while staying auditable.²⁶¹³

Which risks and failure modes deserve special attention

Identity resolution can over link and under link. Over linking merges different people and can harm trust. Under linking fragments the view and reduces value. Data drift quietly erodes quality. Key rotation and browser changes break joins. Consent state mismanagement triggers regulatory exposure. Teams reduce these risks by making evidence explicit, keeping merges reversible, tracking confidence, and separating person from household. They also log matching outcomes and human decisions for audit. They test edge cases that include twins, shared addresses, recycled phone numbers, and contact center identity spoofing. Finally, they plan for schema evolution. A safe system evolves as channels and regulations change without corrupting the graph. These practices are standard in mature entity resolution programs and are achievable with disciplined engineering.⁵¹¹

What are the first three steps to get started

Leaders start small and controlled. Step one defines the entity model and the minimum keyset for each channel, with clear semantics for person, account, device, and household. Step two implements a pilot pipeline that handles a narrow slice of data, writes explainable evidence, and exposes APIs for resolve and unmerge. Step three sets up measurement with labeled samples and a dashboard for precision and recall. This sequence builds credibility and avoids vendor lock in. It also creates a platform for incremental adoption in marketing, service, and analytics. Use proven patterns from the literature and standards community. Validate against privacy guidance. Socialize the evidence model with CX and legal early. This approach delivers early value while laying a durable foundation for identity at enterprise scale.⁴¹¹¹³

FAQ

What is identity resolution in customer experience at Customer Science?
Identity resolution at Customer Science links records and events that refer to the same person, device, account, or household, using deterministic keys and probabilistic scores to create a single, auditable view for CX, service, and analytics.⁴⁵

How do keys, graphs, and stitching work together in identity resolution?
Keys anchor identifiers, graphs model relationships between those identifiers and entities, and stitching applies rules and scores to merge or separate records while writing explainable evidence for each decision.⁸⁹

Which regulations and standards guide identity resolution design?
Design aligns to GDPR principles for personal and pseudonymous data, to NIST guidance on de identification, and to IAB Tech Lab addressability and taxonomy standards used in advertising and martech integrations.¹¹¹²¹³

Why measure precision and recall in an identity program?
Precision and recall quantify match correctness and coverage. Teams use labeled samples to compute these metrics and track drift, which protects customer trust and business outcomes over time.¹⁴

Which identifiers should enterprises prioritize for first party identity?
Enterprises prioritize verified account IDs, user IDs, and synthetic UUIDs for events, and supplement with device and cookie identifiers, as illustrated in Google Analytics 4 identity spaces documentation.²⁶

Who benefits from a shared identity fabric across CX and service?
C level leaders, enterprise CX executives, contact center leaders, and customer insight teams benefit from a shared identity fabric because it reduces waste, speeds service, and enables consent aware personalization at scale.¹³

Which first steps should a business take to launch identity resolution with Customer Science?
A business should define the entity model and keyset, stand up a pilot pipeline that writes explainable evidence and supports unmerge, and implement a quality dashboard for precision and recall before scaling to activation.⁴¹⁴

Sources

MDN Web Docs. “HTTP cookies.” 2024. Mozilla. https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
Google Analytics Help. “About identity spaces in Google Analytics.” 2024. Google. https://support.google.com/analytics/answer/9191807
CDP Institute. “CDP Institute Practical Guide to Identity Resolution.” 2020. CDP Institute. https://www.cdpinstitute.org/resource/cdp-institute-practical-guide-to-identity-resolution/
Peter Christen. “Data Matching: Concepts and Techniques for Record Linkage.” 2012. Springer via Google Books preview. https://books.google.com/books?id=qGx2fW8u2b0C
Hernandez, M. A., and Stolfo, S. J. “The Merge/Purge Problem for Large Databases.” 1995. Proceedings of ACM SIGMOD. http://www1.cs.columbia.edu/~martha/papers/merge-purge.pdf
Leach, P., Mealling, M., Salz, R. “A Universally Unique IDentifier (UUID) URN Namespace.” 2005. IETF RFC 4122. https://www.rfc-editor.org/rfc/rfc4122
IAB Tech Lab. “Project Rearc Addressability Working Group resources.” 2023. IAB Tech Lab. https://iabtechlab.com/project-rearc/
Neo4j Graph Data Science. “Entity Resolution using Graph Data Science.” 2023. Neo4j. https://neo4j.com/docs/graph-data-science/current/machine-learning/entity-resolution/
Getoor, L., and Machanavajjhala, A. “Entity Resolution for Big Data.” 2013. KDD Tutorial. http://www.cs.umd.edu/~getoor/Tutorials/kdd2013-tutorial-ER.pdf
Winkler, W. E. “Overview of Record Linkage and Current Research Directions.” 2006. U.S. Census Bureau. https://www.census.gov/srd/papers/pdf/rrs2006-02.pdf
NIST. “De-Identification of Personal Information.” NISTIR 8053. 2015. National Institute of Standards and Technology. https://nvlpubs.nist.gov/nistpubs/ir/2015/NIST.IR.8053.pdf
European Union. “General Data Protection Regulation, Recital 26.” 2016. EUR-Lex. https://eur-lex.europa.eu/eli/reg/2016/679/oj
IAB Tech Lab. “Taxonomy and Data Transparency Standards.” 2024. IAB Tech Lab. https://iabtechlab.com/standards/
Manning, C. D., Raghavan, P., Schütze, H. “Introduction to Information Retrieval.” 2008. Cambridge University Press, online manuscript. https://nlp.stanford.edu/IR-book/

Customer Experience & Operations

People

AI, Automation & Technology

Management Consulting

Explore the Business

Your Team

Doing Business

For You

How identity resolution works: keys, graphs, and stitching?

Why identity resolution matters for modern CX

What is identity resolution in plain terms

How do identity keys work across channels

Where do graphs fit and why they help

What stitching actually does under the hood

Which privacy and consent rules shape identity resolution

How to measure match quality and operational impact

How do deterministic and probabilistic methods compare

What does a pragmatic target architecture look like

Which risks and failure modes deserve special attention

What are the first three steps to get started

FAQ

Sources

Talk to an expert

Search

services

Products

Our INdustry Practices

Join our mailing list