What to map from the Data Source?

In a way, indexing can be seen as creating a mapping function between objects from the data source, to documents in the index. While this mapping may seem obvious at first, the question shouldn't be overlooked, as it structures the behavior of the search engine.

There is not always a 1 to 1 mapping between unit objects in the data source, and documents in the index.

For example, suppose you are writing a connector for a data source dealing with emails. Should it be possible for a user to find emails based on the content of their attachments? Most probably yes, therefore this connector is probably going to map an email and all its attachments with a single document.

Should it also be possible to find a whole thread of discussion, query with quotes from an email? If so, then the connector will probably push along with the previous documents, 1 document per thread, in which the content of all emails will have been mapped.

For example:

  • For emails / forums: To find a thread, you could have:

    • 1 Email = 1 Document

    • All emails belonging to the same thread = 1 Document

  • Enovia

    • 1 object made of several parts = 1 document

  • Database

    • Star or snowflake schema join = 1 document

      Note: To aggregate data this way, you can use the Consolidation Server.