Uses of Class
org.apache.lucene.analysis.Tokenizer
Packages that use Tokenizer
Package
Description
Text analysis.
Fast, general-purpose grammar-based tokenizers.
Analyzer for Simplified Chinese, which indexes words.
Basic, general-purpose analysis components.
Fast, general-purpose URLs and email addresses tokenizers.
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
Analyzer for Japanese.
Analyzer for Korean.
Character n-gram tokenizers and filters.
Analysis components for path-like strings such as filenames.
Set of components for pattern-based (regex) analysis.
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the
Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.Analyzer for Thai.
Utility functions for text analysis.
Tokenizer that is aware of Wikipedia syntax.
-
Uses of Tokenizer in org.apache.lucene.analysis
Methods in org.apache.lucene.analysis that return TokenizerModifier and TypeMethodDescriptionfinal Tokenizer
TokenizerFactory.create()
Creates a TokenStream of the specified input using the default attribute factory.abstract Tokenizer
TokenizerFactory.create
(AttributeFactory factory) Creates a TokenStream of the specified input using the given AttributeFactoryConstructors in org.apache.lucene.analysis with parameters of type TokenizerModifierConstructorDescriptionTokenStreamComponents
(Tokenizer tokenizer) Creates a newAnalyzer.TokenStreamComponents
from a TokenizerTokenStreamComponents
(Tokenizer tokenizer, TokenStream result) Creates a newAnalyzer.TokenStreamComponents
instance -
Uses of Tokenizer in org.apache.lucene.analysis.classic
Subclasses of Tokenizer in org.apache.lucene.analysis.classicModifier and TypeClassDescriptionfinal class
A grammar-based tokenizer constructed with JFlex -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smartModifier and TypeClassDescriptionclass
Tokenizer for Chinese or mixed Chinese-English text.Methods in org.apache.lucene.analysis.cn.smart that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.core
Subclasses of Tokenizer in org.apache.lucene.analysis.coreModifier and TypeClassDescriptionfinal class
Emits the entire input as a single token.class
A LetterTokenizer is a tokenizer that divides text at non-letters.final class
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.final class
A tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int)
.Methods in org.apache.lucene.analysis.core that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.email
Subclasses of Tokenizer in org.apache.lucene.analysis.emailModifier and TypeClassDescriptionfinal class
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentationModifier and TypeClassDescriptionfinal class
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.jaModifier and TypeClassDescriptionfinal class
Tokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ko
Subclasses of Tokenizer in org.apache.lucene.analysis.koModifier and TypeClassDescriptionfinal class
Tokenizer for Korean that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngramModifier and TypeClassDescriptionclass
Tokenizes the input from an edge into n-grams of given size(s).class
Tokenizes the input into n-grams of the given size(s).Methods in org.apache.lucene.analysis.ngram that return TokenizerModifier and TypeMethodDescriptionEdgeNGramTokenizerFactory.create
(AttributeFactory factory) NGramTokenizerFactory.create
(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.pathModifier and TypeClassDescriptionclass
Tokenizer for path-like hierarchies.class
Tokenizer for domain-like hierarchies.Methods in org.apache.lucene.analysis.path that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.pattern
Subclasses of Tokenizer in org.apache.lucene.analysis.patternModifier and TypeClassDescriptionfinal class
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.final class
final class
-
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standardModifier and TypeClassDescriptionfinal class
A grammar-based tokenizer constructed with JFlex. -
Uses of Tokenizer in org.apache.lucene.analysis.th
Subclasses of Tokenizer in org.apache.lucene.analysis.thMethods in org.apache.lucene.analysis.th that return Tokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.util
Subclasses of Tokenizer in org.apache.lucene.analysis.utilModifier and TypeClassDescriptionclass
An abstract base class for simple, character-oriented tokenizers.class
Breaks text into sentences with aBreakIterator
and allows subclasses to decompose these sentences into words. -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipediaModifier and TypeClassDescriptionfinal class
Extension of StandardTokenizer that is aware of Wikipedia syntax.