Custom Resource Creation
The user dictionary is a UTF-8 text file (user.txt) in the directory KITDIR/resource/all-arch/subtokenizer/ID where KITDIR is the root of CloudView unzipped kit directory and ID is one of following language identifiers: de (german), nl (dutch), no (norwegian).
The file format must be as follows (lines failing to match this format are ignored):
- Lines starting with # are comments (ignored)
- Lines containing one word define uncompoundable words
- Lines containing three words explicitly define how the 1st word of the line must be decompounded into the 2nd and the 3rd words
Note that words are matched case-insensitively but accents matter (Kuchen is not Küchen).
# this is an ignored comment # this line states that Volkswagen shouldn't be split: volkswagen # this line forces decompounding of Lastwagenfahrer into Lastwagen+Fahrer: lastwagenfahrer lastwagen fahrer