Duplicate contacts can turn up in your data for many reasons, such as mistakes by users who don't realise they're creating a contact for someone who is already in CiviCRM, duplicates that aren't caught in the import process and duplicate records created when people fill in forms about themselves (maybe with their names spelled differently or with a different email address) on your site without realising they're already in your list of contacts.
CiviCRM is equipped with duplicate matching rules that are applied automatically when new contacts are created, and can be run manually at any time to search for duplicates. You can configure these rules to suit your needs.
To view the dedupe rules, go to Contacts > Find and Merge Duplicate Contacts in the navigation menu. This displays the following screen:
From the screen, here's an example of a process to dedupe all individuals in your data:
Different rules are configured for each contact type (individuals, organizations, and households.) A default fuzzy rule and a default strict rule is set for each contact type. The default rules are used when CiviCRM invokes automatic checking, in ways we'll explain in detail shortly.
CiviCRM includes two categories of dedupe rules:
Strict: this type of rule places a priority on avoiding false matches, and therefore applies relatively rigid criteria. It is therefore possible to sometimes miss real duplicates.
Strict rules are invoked during imports to scan for duplicates without user intervention. These rules are used here because it is easier to sort out duplicates later than to disentangle two incorrectly merged contacts.
An example of a strict rule is one that matches individual contacts only if three criteria are met: identical email addresses, first names, and last names. This rule would allow both Mike Tael and Michael Tael into the database because only two criteria are met: last name and email rather than first name, last name, and email.
Default strict rules are also automatically checked when new contacts are created through online registrations including events, membership, contributions, and profile pages, and when you create a contact through CiviCRM's programming API.
Fuzzy: this type of rule has a relatively loose definition of matches in the hope of catching as many possible duplicates as possible.
Fuzzy rules are used in instances where human intelligence can be applied to decide whether a match is accurate. This means that a wider range of possible match results is both permissible and useful.
Default fuzzy rules are automatically used to check for possible duplicates when contacts are added or edited via the CiviCRM user interface (the default strict rules are automatically used when contacts are added or edited via a Profile, the API, or on import). You'll probably also want to use a fuzzy rule when scanning your database for possible duplicates.
To determine whether two contacts are duplicates, CiviCRM checks up to five fields that you can specify. You can also set a length value which determines how many characters in the field should be compared. For example, if you set a length of 2 on the First name field, a first name of "Mike" would match "Michael" and they would be recognized as duplicates, because the first 2 characters are the same. However, if you set the length to 3 instead, "Mike" would no longer match "Michael" and they would be accepted as different contacts. If the length value is left blank, the comparison is done on the entire field value.
Each field is also configured with a numeric weight that determines the relative importance of a match on that field. When a match is discovered on a field, that field's weight is added to the total weight for the rule. After each field is checked, if the total weight is equal to or greater than the numerical threshold set for the rule, the contacts being compared are flagged as suspected duplicates.
If you notice duplicate contacts within a set of search results you can quickly merge them directly from the search results instead of using the separate Find and Merge Duplicate Contacts process. This is a great way to clean up your database during your everyday workflow with minimal disruption.