Customizing Document Analysis

NETVIBES OnePart is delivered with a vast number of document processors that can alter documents in analysis pipelines. By assembling these processors, most analysis tasks can be performed.

This task shows you how to:

Context:

However, for advanced and custom operations, it is often required or more convenient to write custom document processors.

Note: For functional details on document processors, see the Exalead CloudView Configuration Guide.
Important: Requires the OnePart Customization SDK

A custom document is a Java class extending the com.exalead.pdoc.analysis.CustomDocumentProcessor class. It manipulates the document as a com.exalead.pdoc.ProcessableDocument object.

What can it do?

A document processor can:

  • modify, create or remove document metas
  • modify, create or remove document parts
  • discard a document: ignore it, or delete it from the index

A document processor cannot:

  • modify the URI or stamp of a document
  • create new documents

Samples

Several samples of document processors are available in the Exalead CloudView kit, in INSTALLDIR/sdk/java-customcode/samples/document-processors. You can build the samples using Apache Ant. This will create a plugin zip file, which can be installed in Exalead CloudView.

Debugging

The process() method of the CustomDocumentProcessor receives a DocumentProcessingContext argument. You should use the DocumentProcessingContext methods to report any error or warning with the document. This ensures that all error context is adequately captured for efficient debugging.

OnePart document processors

OnePart deploys with the several plugins by default including the processor plugins as detailed below. All document processors have a class id = com.exalead.onepart.analysis.processors+<class_id_name>

  • onepart-documentprocessors containing the OnePart specific document processors

  • apps-documentprocessors containing the common NETVIBES apps specific document processors.

    Important: The OnePart and apps document processors must not be removed from the analysis pipeline

Context: OnePart-specific document processors cannot be edited in the Administration Console as they are packaged as custom document processors. If you want to change the behavior of one of these processors then you must disable it first and then deploy your new custom document processor. OnePart processors, listed by class Id, with their corresponding document processor name.

Document Processor

class_id_name

Description

family

type

.FamilyAndTypeSetter

Sets the family and type.

Measure adapter .MeasureAdapter Will adapt numerical values with defined unit.
Elastic data model parser

.ElasticDataModelParser

Maps the EDM metas defined in the EDM configuration file.
file - folder

.FileFolderMetaSetter

Sets the folder location.
rename_dwg_block_refs

.NumberedMetaListRenamer

Rename metas having the same prefix and suffixed with contiguous increasing numbers starting from 0.
extract_file_name_from_xref .SplitJoin Split the meta value using a separator and join a configurable number of the tokens.
version collapsing .VersionCollapsing Sets metas for the collapsing feature.
SolidWorksDateCleaner .SolidWorksDateCleaner Handles dates coming from SOLIDWORKS files.

onePart ModelClass

.DatamodelClassSetter

Sets the class in the data model.

OnePartAttributesHoles

.BRepHoleCount

Sets the part hole count.

OnePartHolesGeometry

.BRepHoleGeometry

Sets the geometry of the part holes.
SolidWorks Drawing Extractor

.SolidWorksDrawingTextExtractor

Extracts text from a SOLIDWORKS Drawing file
2d3d_original_path 2d3d_original_file .OriginalFilePathSetter SOLIDWORKS - sets the original_file and original_path metas.
2d3d_author

.AuthorSetter

Sets SOLIDWORKS author meta and determines best author meta value.
Best date setter .DatesSetter Sets the different date metas values (creation and modification).
3d_shape_part_number

defaults as file_name

.PartNumberSetter Sets the part number of a 3D file. If none is found, the file name is then used.

title_proc

.TitleSetter

Sets the title for 3D documents.

JS API metas setter (js_api_*) .JsApiMetaSetter Sets metas for OnePart in CAD integration.

3d_shape_volume

3d_shape_mass

3d_shape_density

.PhysicalMeasuresSetter

Sets mass, volume and density for 3D documents.

relationship_builder

.RelationshipBuilder

Creates child id links and children list for CAD files.
child_id_finger_print .ChildIdFingerPrint Calculates the finger print of the child_id value.

3d_shape_bbbox

.BoundingBoxExtractor

Reads SOLIDWORKS xml to extract bounding box data.

mech_feature_extractor

.MechanicalFeatureExtractor

Extracts mechanical features for SOLIDWORKS.

mech_feature_count

.MechanicalFeatureCounter

Using input from a semantic pipe, counts each type of mechanical feature.

Configuration mapper .ConfigurationsMapper Extracts and maps the configuration to OnePart data model.

3d_shape_material_rename

.MaterialMetaCleaner

Normalizes the meta material's value.

Signature Document Processors A group of processors to handle 3D signatures
width_height_ratio setter .WidthHeightRatioSetter Sets the width_height_ratio meta.

Used for thumbnail display.

meta_copier

.MetaCopier Copies meta from one to another.

software_version

.SoftwareVersionExtractor Extracts the CAD software version to fill the family facet.

Thumbnail_Index_main

.ThumbnailIndex Sets metas and parts necessary to store thumbnails in index.

limit_facet_deepness

.HierarchicalFacetLimiter

Allows to clean meta values that will be used for facet and prevent split when not needed.

For advanced and custom operations, it is often required or more convenient to write custom document processors. A custom document is a Java class extending the com.exalead.pdoc.analysis.CustomDocumentProcessor class. It manipulates the document as a com.exalead.pdoc.ProcessableDocument object.

Note: Do not use an existing class id when writing your own processor.

Add a custom document processor to your analysis pipeline

Once you have developed your custom document processor, you can add it to your document analysis pipeline in the Exalead CloudView Administration Console.

  1. Package and upload the plugin containing your document processor.

    see the Exalead CloudView Programmer's Guide: Package your custom components.

  2. Open the Administration Console at http://HOSTNAME:<BASEPORT+1>/admin.
  3. Select the ap0 pipeline.

    Note: OnePart deploys with two semantic pipelines: ap0 and sem. Custom processors must be added to the primary pipeline ap0. The ap0 pipeline calls the sem pipeline.

  4. Expand Custom and drag a Custom Document Processor to the Current processors panel.
  5. Fill in the Class id (available document processors will be suggested automatically).
  6. If there is additional configuration for the processor, you will have the ability to fill in the configuration keys.

Write custom document processors

Document processors can be written directly in the Administration Console, using the integrated code editor.

  1. Open the Administration Console at http://HOSTNAME:<BASEPORT+1>/admin.
  2. Go to Index > Data Processing > Pipeline name > Document Processors.
  3. Expand Custom and drag a Java Document Processor to the Current processors panel.
  4. Select Inline Java, click on Edit java.
  5. Click Check source code to verify that the code compiles correctly.
  6. Click Accept and then Apply.

    Your custom document processor is now active.

Disable an existing custom document processor

OnePart custom document processors cannot be edited in the Administration Console. If you need to change the behavior of one of these processors then you must disable it and then deploy your new custom document processor.

Context:

Note: Your custom document processor should have a unique class id. See the table above in OnePart document processors.
Important: Requires the OnePart Customization SDK

  1. In the Administration Console, go to Index > Data Processing.
  2. Click the ap0 analysis pipeline.
  3. In the Document Processors tab, click the processor you require, for example, onePart Model Class.
  4. Select Disable processor.

  5. Click Apply.

You can now deploy your own custom document processor using plugins. See Exalead CloudView Programmer's Guide: Writing Custom Document Processors.