Defining the Application Scope

What do you want to search is the first question to ask yourself. You need to have a good understanding of the document corpus to configure your SBA properly.

You must know the type of data provided by your data sources:

  • Unstructured data, for example, file system documents like pdfs, image files, Microsoft Office files, etc.

  • Structured, for example, data coming from a JDBC database.

For each type of data source that will feed Exalead CloudView with data, what do you want to index? Do you want to index all data, or just a part of it?

For example, if you index the content of a file system, do you want to exclude some files? For a database, do you want to retrieve all tables? What will be the primary keys in your table to give a meaning to your data?

This page discusses:

Tutorial: What Is Our Data Source

For this tutorial, we will use the sample SQLite database located in: <INSTALLDIR>/docs/sample_database/data.db

We thus have structured data coming from a database with four tables and the id field as primary key (PK) to make inner joins between them. The schema of the database is shown below.

Database schema for the sample database

Tutorial: What We Want to Do Functionally

We want to use this database to create an SBA for an international clothing manufacturer. Its sales director wants it to be useful for various analytics needs, like showing the quantity of each article sold per city, the color of the most popular article in each country, the best salesmen, and saleswomen internationally and which articles they best sold to clothing stores.