Package org.languagetool.language
Class LanguageIdentifier
java.lang.Object
org.languagetool.language.LanguageIdentifier
Identify the language of a text. Note that some languages might never be
detected because they are close to another language. Language variants like
en-US or en-GB are not detected, the result will be
en
for those.
By default, only the first 1000 characters of a text are considered.
Email signatures that use \n-- \n
as a delimiter are ignored.- Since:
- 2.9
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
private boolean
private BufferedReader
private BufferedWriter
private Process
private static final int
private final com.optimaize.langdetect.LanguageDetector
private static final org.slf4j.Logger
private final int
private static final double
private static final int
private static final Pattern
private final com.optimaize.langdetect.text.TextObjectFactory
private static final float
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
canLanguageBeDetected
(String langCode, List<String> additionalLanguageCodes) @Nullable Language
detectLanguage
(String text) @Nullable DetectedLanguage
detectLanguageCode
(String text) (package private) @Nullable DetectedLanguage
void
enableFasttext
(File fasttextBinary, File fasttextModel) getHighestScoringResult
(Map<String, Double> probs) private List<com.optimaize.langdetect.profiles.LanguageProfile>
loadProfiles
(List<String> langCodes) runFasttext
(String text, List<String> additionalLanguageCodes) private void
startFasttext
(File modelPath, File binaryPath)
-
Field Details
-
logger
private static final org.slf4j.Logger logger -
MINIMAL_CONFIDENCE
private static final double MINIMAL_CONFIDENCE- See Also:
-
K_HIGHEST_SCORES
private static final int K_HIGHEST_SCORES- See Also:
-
SHORT_ALGO_THRESHOLD
private static final int SHORT_ALGO_THRESHOLD- See Also:
-
CONSIDER_ONLY_PREFERRED_THRESHOLD
private static final int CONSIDER_ONLY_PREFERRED_THRESHOLD- See Also:
-
SIGNATURE
-
ignoreLangCodes
-
externalLangCodes
-
THRESHOLD
private static final float THRESHOLD- See Also:
-
languageDetector
private final com.optimaize.langdetect.LanguageDetector languageDetector -
textObjectFactory
private final com.optimaize.langdetect.text.TextObjectFactory textObjectFactory -
maxLength
private final int maxLength -
fasttextEnabled
private boolean fasttextEnabled -
fasttextProcess
-
fasttextIn
-
fasttextOut
-
-
Constructor Details
-
LanguageIdentifier
public LanguageIdentifier() -
LanguageIdentifier
public LanguageIdentifier(int maxLength) - Parameters:
maxLength
- the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.- Throws:
IllegalArgumentException
- ifmaxLength
is less than 10- Since:
- 4.2
-
-
Method Details
-
enableFasttext
-
getLanguageCodes
-
loadProfiles
private List<com.optimaize.langdetect.profiles.LanguageProfile> loadProfiles(List<String> langCodes) throws IOException - Throws:
IOException
-
detectLanguage
- Returns:
- language or
null
if language could not be identified
-
detectLanguageWithDetails
- Returns:
- language or
null
if language could not be identified
-
detectLanguage
@Nullable public @Nullable DetectedLanguage detectLanguage(String text, List<String> noopLangsTmp, List<String> preferredLangsTmp) - Parameters:
noopLangsTmp
- list of codes that are detected but will lead to the NoopLanguage that has no rules- Returns:
- language or
null
if language could not be identified - Since:
- 4.4 (new parameter noopLangs, changed return type to DetectedLanguage)
-
canLanguageBeDetected
-
startFasttext
- Throws:
IOException
-
getHighestScoringResult
-
runFasttext
private Map<String,Double> runFasttext(String text, List<String> additionalLanguageCodes) throws IOException - Throws:
IOException
-
detectLanguageCode
- Returns:
- language or
null
if language could not be identified
-