Package edu.berkeley.nlp.lm.io
Class MakeKneserNeyArpaFromText
java.lang.Object
edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText
Estimates a Kneser-Ney language model from raw text, and writes the language
model out in ARPA-format. This is meant to closely resemble the functionality
of SRILM's
(a) rather than calculating the discount for each n-gram order from counts, we use a constant discount of 0.75 for all orders
(b) Count thresholding is currently not implemented (SRILM by default thresholds counts for n-grams with n > 3).
ngram-count -text <text file> -ukndiscount -lm <outputfile>)
, with two main exceptions: (a) rather than calculating the discount for each n-gram order from counts, we use a constant discount of 0.75 for all orders
(b) Count thresholding is currently not implemented (SRILM by default thresholds counts for n-grams with n > 3).
Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary. If no input files or given (or "-" is specified), lines will be read from standard input.
- Author:
- adampauls
-
Constructor Summary
Constructors -
Method Summary
-
Constructor Details
-
MakeKneserNeyArpaFromText
public MakeKneserNeyArpaFromText()
-
-
Method Details
-
main
-