dtSearch considerations when searching for Exchange Strings
In addition to the usual SMTP user@domain address format, an email may contain a participant’s Exchange address if it was sent between Exchange users. This presents certain challenges when searching exhaustively for all messages in which a given person may have participated, such as when preparing for an interview or when organizing documents for an individual to review for data privacy.
An efficient way to search for documents containing a participant’s name or known email addresses is to set up an STR, which relies on dtSearch. We’ll assume the search index used includes the Extracted Text and all email sender and recipient fields, and that the index uses the default settings for the alphabet file. This is a very common general purpose dtSearch setup.
The Exchange String appears in a format similar to the below, depending on the organization’s setup:
Karen Murphy <”cn=kmurphy/ou=engineering_field_development_group/o=gofaster/c=ca”>
The intuitive approach would be to copy the string exactly as it appears in the sender/recipient field or body text and use it as a search term. However, this would result in 0 hits. Let’s take a closer look at why.
We can immediately spot three things:
- The display name portion would be hit with a karen w/2 murphy search term, which should already be a search term in your communication STR. We can trim this portion off to make the string shorter and to avoid instances where the display name is missing in the message, which would result in 0 hits for this term.
- There are non-alphanumeric characters present: <, “, =, and /.
- The equals sign is a numerical pattern search operator. Since = is treated as a space by the index, leaving it in the search term will result in 0 hits. We can simply substitute a space in its place. This could also happen with any other search operator character that might appear in the string, such as ?, #, *, %, etc.
- The <” and “> portions can be trimmed off. The < and > characters are treated as a space by the index; “ is both treated as a space by the index and does nothing to signify an exact phrase in this term and so is superfluous.
- The / character is treated as a space by the index and is ignored when entered as part of a search string, so it is not a problem. We can leave it as it is.
- The administrative group portion, engineering_field_development_group, exceeds 32 characters and so is truncated in the index. This string should still work, but in the real world it can make our term return 0 hits.
Let’s rewrite the term to the below:
cn kmurphy/ou engineering_field_development_g*/o gofaster/c ca
This version should run reliably and return hits on Karen Murphy’s Exchange address. The ultimate test of any search term is whether it returns the intended results. If you find an Exchange address in a document set and then run the term against dtSearch with 0 documents returned, there is something wrong.