The second method will process the input document, and produce
extracted data or previews.
/**
* Transform a part 'part' using the format 'format' into a destination
* document. Use getSupporterOutputMime() to get the list of supported
* output MIME types.
*
* @param part
* the input part.
* @param format
* the transformation format
* @param input
* the source document (part is probably in this document)
* @param output
* the target document (parts are stored inside this document)
* @throws UnsupportedInputFormatException
* if the input format is not supported by the filter
(in this case, the upstream client will give up with the current filter)
* @throws UnsupportedOutputFormatException
* if the output format is not supported by the filter
(in this case, the upstream client may retry with a different format, using the same transformer)
* @throws NoSuchMethodException
* if the method is not supported
* @throws TransformationException
* upon error (in this case, the upstream client may choose to
* give up on the input, or select another filter)
* Note: input and output may be the same objects
**/
public void transform(DocumentPart part, ProcessableDocument input,
ProcessableDocument output, Format format)
throws TransformationException, UnsupportedInputFormatException,
UnsupportedOutputFormatException, NoSuchMethodException;
This function takes:
-
a part as input (the part contains data, and associated metadata),
-
the related input document, which is generally unused,
-
the output document where multiple parts might be added,
-
and the requested format (whether information extraction is
requested for the display processing and the output MIME type).
When producing content:
-
Each produced file must be embedded in a Part document, created
through the output object
addPart
method.
-
For referenced parts, parts must have proper MIME type advertised,
and proper filenames. If the first HTML part embeds relative links to
resources, the given resources must be properly named, using the same relative
filenames.
Note:
The part name is usually
preview
for a preview, and
document
for extracted metadata, but the naming is
free. The first part must be the leading part. However, if the produced content
is an HTML preview, the first part must be the master document.
The following example shows the skeleton of a
transform()
method:
@Override
public void transform(DocumentPart part, ProcessableDocument input,
ProcessableDocument output, Format format)
throws TransformationException, UnsupportedInputFormatException,
UnsupportedOutputFormatException
{
// Validate requested output format
final boolean isText = format.getMime().equalsIgnoreCase(
Format.MIME_TEXT);
final boolean isHtml = format.getMime().equalsIgnoreCase(
Format.MIME_HTML);
final String outMime = format.getMime();
if (!isText && !isHtml) {
throw new UnsupportedOutputFormatException("unsupported format");
}
// Validate input format
String mime = part.getComputedMime();
if (!isNotMyFormat(part.getFilename(), part.getForcedMime())) {
throw new UnsupportedInputFormatException("unsupported MIME type");
}
// Transform
try {
byte[] data = part.getContentAsBytes();
...
if (isHtml) {
// Prepare final part
final DocumentPart dp = output.addPart("preview");
dp.setEncoding("utf-8");
dp.setForcedMime(format.getMime());
...
dp.setContent(xml.toString().getBytes("UTF-8"));
}
} catch (IOException io) {
throw new TransformationException(io);
}
}