Package org.apache.uima.cas.impl
Class BinaryCasSerDes6
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes6
- All Implemented Interfaces:
SlotKindsConstants
User callable serialization and deserialization of the CAS in a compressed Binary Format
This serializes/deserializes the state of the CAS. It has the capability to map type systems,
so the sending and receiving type systems do not have to be the same.
- types and features are matched by name, and features must have the same range (slot kind)
- types and/or features in one type system not in the other are skipped over
Header specifies to reader the format, and the compression level.
How to Serialize:
1) create an instance of this class
a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created
after deserializing this CAS initially.
b) if serializaing to a target with a different type system, pass the target's type system impl object
so the serialization can filter the types for the target.
2) call serialize() to serialize the CAS
3) If doing serialization to a target from which you expect to receive back a delta CAS,
create a ReuseInfo object from this object and reuse it for deserializing the delta CAS.
TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in
serializing or deserializing. These are preserved for future calls, so their setup / initialization is only
needed the first time.
TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems;
these too are preserved and reused on future calls.
Compressed Binary CASes are designed to be "self-describing" -
The format of the compressed binary CAS, including version info,
is inserted at the beginning so that a proper deserialization method can be automatically chosen.
Compressed Binary format implemented by this class supports type system mapping.
Types in the source which are not in the target
(or vice versa) are omitted.
Types with "extra" features have their extra features omitted
(or on deserialization, they are set to their default value - null, or 0, etc.).
Feature slots which hold references to types not in the target type system are replaced with 0 (null).
How to Deserialize:
1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must
be the originating CAS from which the delta was produced.
2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta,
then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx
a) Assuming the object being deserialized has a different type system,
set the "target" type system to the TypeSystemImpl instance of the
object being deserialized.
a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized
3) call xxx.deserialize(inputStream)
Compression/Decompression
Works in two stages:
application of Zip/Unzip to particular sub-collections of CAS data,
grouped according to similar data distribution
collection of like kinds of data (to make the zipping more effective)
There can be up to ~20 of these collections, such as
control info, float-exponents, string chars
Deserialization:
Read all bytes,
create separate ByteArrayInputStreams for each segment
create appropriate unzip data input streams for these
Slow but expensive data:
extra type system info - lazily created and added to shared TypeSystemImpl object
set up per type actually referenced
mapper for type system - lazily created and added to shared TypeSystemImpl object
in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl.
Defaulting:
flags: doMeasurements, compressLevel, CompressStrategy
Per serialize call: cas, output, [target ts], [mark for delta]
Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization
CASImpl has instance method with defaulting args for serialization.
CASImpl has reinit which works with compressed binary serialization objects
if no type mapping
If type mapping, (new BinaryCasSerDes6(cas,
marker-or-null,
targetTypeSystem (for stream being deserialized),
reuseInfo-or-null)
.deserialize(in-stream)
Use Cases, filtering and delta
**************************************************************************
* (de)serialize * filter? * delta? * Use case
**************************************************************************
* serialize * N * N * Saving a Cas,
* * * * sending Cas to service with identical ts
**************************************************************************
* serialize * Y * N * sending Cas to service with
* * * * different ts (a guaranteed subset)
**************************************************************************
* serialize * N * Y * returning Cas to client
* * * * uses info saved when deserializing
* * * * (?? saving just a delta to disk??)
**************************************************************************
* serialize * Y * Y * NOT SUPPORTED (not needed)
**************************************************************************
* deserialize * N * N * reading/(receiving) CAS, identical TS
**************************************************************************
* deserialize * Y * N * reading/receiving CAS, different TS
* * * * ts not guaranteed to be superset
* * * * for "reading" case.
**************************************************************************
* deserialize * N * Y * receiving CAS, identical TS
* * * * uses info saved when serializing
**************************************************************************
* deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset)
* * * * uses info saved when serializing
**************************************************************************
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Compression alternativesstatic enum
private class
Modified Values Modified heap values need fsStartIndexes conversionstatic class
Info reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out..private class
Modified Values Output: For each FS that has 1 or more modified values, write the heap addr converted to a seq # of the FS For all modified values within the FS: if it is an aux array element, write the index in the aux array and the new value otherwise, write the slot offset and the new value -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate AllowPreexistingFS
Things for just deserializationprivate DataInputStream
private final ByteArrayOutputStream[]
private final BinaryCasSerDes
private DataInputStream
private DataOutputStream
private final CASImpl
Things for both serialization and Deserializationprivate final BinaryCasSerDes6.CompressLevel
private final BinaryCasSerDes6.CompressStrat
private DataInputStream
private DataOutputStream
private TOP
the FS being deserializedprivate final DataInputStream[]
private final boolean
private DataInputStream
private final boolean
private final DataOutputStream[]
private DataInputStream
private DataOutputStream
private DataInputStream
private DataOutputStream
private static final String
the "fixups" for relative heap refs actions set slot valuesprivate DataInputStream
private DataOutputStream
private DataInputStream
private DataOutputStream
private PositiveIntSet
ordered set of FSs found in indexes or linked from other found FSs.private PositiveIntSet
ordered set of FSs found in indexes or linked from other found FSs, which are below the mark.private DataInputStream
private DataOutputStream
private final CasSeqAddrMaps
maps from src id <-> tgt id For deserialization: if src type not exist, tgt to src is 0FSs being serialized.private DataInputStream
private final Inflater[]
private DataInputStream
private boolean
private boolean
private final boolean
private boolean
private boolean
private boolean
This is the used version of isTypeMapping, normally == to isTypeMappingCmn But compareCASes sets this false temporarily while setting up the compareprivate boolean
private int
private DataInputStream
private DataInputStream
private final MarkerImpl
private int
private boolean
private OptimizeStrings
private final Int2ObjHashMap
<long[], long[]> Hold prev values of "long" slots, by type, for instances of FS which are non-arrays containing slots which have long values, used for differencing - not using the actual FS instance, because during deserialization, these may not be deserialized due to type filtering set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based)private final int[][]
Hold prev instance of FS which have non-array FSRef slots, to allow computing these to match case where a 0 value is used because of type filtering and also to allow for forward references.private String[]
private final boolean
private DataOutputStream
Things for just serializationprivate DataInputStream
Deferred actions to set Feature Slots of feature structures.private final SerializationMeasures
private String
private String
private int
used for deferred creationprivate Sofa
private TypeSystemImpl
Things set up for one instance of this classprivate DataInputStream
private final StringHeap
private DataInputStream
private DataOutputStream
private DataInputStream
private DataOutputStream
private DataInputStream
private DataOutputStream
private final TypeSystemImpl
FSs being processed, including below-the-line deltas.private static final boolean
private static final boolean
private static final boolean
private static final boolean
private static final boolean
private DataInputStream
private DataOutputStream
private final CasTypeSystemMapper
private PositiveIntSet
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.private int
Fields inherited from interface org.apache.uima.cas.impl.SlotKindsConstants
arrayLength_i, byte_i, CAN_BE_NEGATIVE, control_i, double_Exponent_i, double_Mantissa_Sign_i, float_Exponent_i, float_Mantissa_Sign_i, fsIndexes_i, heapRef_i, IGNORED, IN_MAIN_HEAP, int_i, long_High_i, long_Low_i, NBR_SLOT_KIND_ZIP_STREAMS, short_i, strChars_i, strLength_i, strOffset_i, strSeg_i, typeCode_i
-
Constructor Summary
ConstructorsModifierConstructorDescriptionSetup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature StructuresSetup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structuresprivate
BinaryCasSerDes6
(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, boolean storeTS, boolean storeTSI, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) BinaryCasSerDes6
(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurementsBinaryCasSerDes6
(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature StructuresBinaryCasSerDes6
(AbstractCas cas, TypeSystemImpl tgtTs) Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures(package private)
BinaryCasSerDes6
(BinaryCasSerDes6 f6, TypeSystemImpl tgtTs) only called to set up for deserialization. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addStringsFromFS
(TOP fs) Add all the strings ref'd by this FS.private void
private void
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streamsboolean
compareCASes
(CASImpl c1, CASImpl c2) Compare 2 CASes, with perhaps different type systems.private void
createCurrentFs
(TypeImpl type, CASImpl view) private long
decodeDouble
(long mants, int exponent) private int
decodeIntSign
(int v) void
deserialize
(InputStream istream) void
deserialize
(InputStream istream, AllowPreexistingFS allowPreexistingFS) Version used by uima-as to read delta cas from remote parallel stepsvoid
deserializeAfterVersion
(DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS) private int
encodeIntSign
(int v) private void
Add Fs to toBeProcessed and set foundxxx bit - skip this if doesn't exist in target type systemprivate DataInput
private int
getPrevIntValue
(int typeCode, int featOffset) For heaprefs this gets the previously serialized int valueprivate long
getPrevLongValue
(int typeCode, int featOffset) private TOP
getRefVal
(int tgtSeq) private int
For Serialization only.(package private) TypeSystemImpl
getTgtTs()
private int[]
Get and lazily initialize if needed the feature cache values for a type For Serializing, the type belongs to the srcTs For Deserializing, the type belongs to the tgtTsprivate long[]
Get and lazily initialize if needed the long values for a type For Serializing and Deserializing, the type belongs to the tgtTsprivate void
Serializing: Called at beginning of serialize, scans whole CAS or just delta CAS If doing delta serialization, fsStartIndexes is passed in, pre-initialized with a copy of the map info below the line.private boolean
isTypeInTgt
(TOP fs) private static DataOutputStream
private void
maybeStoreOrDefer
(boolean storeIt, TOP fs, Consumer<TOP> doStore) private void
maybeStoreOrDefer_slotFixups
(int tgtSeq, Consumer<TOP> r) FS Ref slots fixupsprivate void
processFSsForView
(boolean isEnqueue, Stream<TOP> fss) processes one view's worth of feature structuresprivate void
processIndexedFeatureStructures
(CASImpl cas1, boolean isWrite) private void
private int
private void
readByKind
(TOP fs, FeatureImpl tgtFeat, FeatureImpl srcFeat, boolean storeIt, TypeImpl tgtType) private int
private int
readDiff
(SlotKinds.SlotKind kind, int prev) private int
readDiffIntSlot
(boolean storeIt, int featOffset, SlotKinds.SlotKind kind, TypeImpl tgtType) private long
private int
private void
readFsxPart
(IntVector fsIndexes) Each FS index is sorted, and output is by deltaprivate CommonSerDes.Header
readHeader
(InputStream istream) HEADERSprivate void
process index information to re-index thingsprivate void
readIntoByteArray
(byte[] array, int length, boolean storeIt) private void
readIntoDoubleArray
(double[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) private void
readIntoLongArray
(long[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) private void
readIntoShortArray
(short[] array, int length, boolean storeIt) private long
readLongOrDouble
(SlotKinds.SlotKind kind, long prev) private String
readString
(boolean storeIt) private long
private long
readVlong
(DataInputStream dis) private int
S E R I A L I Z Eprivate void
serializeArray
(TOP fs) private int
private void
serializeByKind
(TOP fs, FeatureImpl feat) serialize one feature structure, which is guaranteed not to be null guaranteed to exist in target if there is type mapping Caller iterates over target slots, but the feat arg is for the corresponding src featureprivate void
serializeDiffWithPrevTypeSlot
(SlotKinds.SlotKind kind, TOP fs, FeatureImpl feat, int newValue) private static void
setupOutputStream
(int i, int size, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources) private void
setupOutputStreams
(Object out) Set up Streams(package private) static void
setupOutputStreams
(CASImpl cas, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources) private void
setupReadStream
(int slotIndex, int bytesCompr, int bytesOrig) private void
(package private) static void
skipBytes
(DataInputStream stream, int skipNumber) private void
skipDouble
(int length) private void
skipLong
(int length) private void
updatePrevArray0IntValue
(TypeImpl ti, int newValue) version called for arrays, captures the 0th valueprivate void
updatePrevIntValue
(TypeImpl ti, int featOffset, int newValue) Called for non-arraysprivate void
updatePrevLongValue
(TypeImpl ti, int featOffset, long newValue) private void
write0
(int kind) private int
writeDiff
(int kind, int v, int prev) Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = deltaprivate void
writeDouble
(long raw) private void
writeFloat
(int raw) private void
writeLong
(long v, long prev) private void
private void
private void
writeUnsignedByte
(DataOutputStream s, int v) private void
writeVnumber
(int kind, int v) private void
writeVnumber
(int kind, long v) private void
writeVnumber
(DataOutputStream s, int v) private void
writeVnumber
(DataOutputStream s, long v)
-
Field Details
-
EMPTY_STRING
- See Also:
-
TRACE_SER
private static final boolean TRACE_SER- See Also:
-
TRACE_DES
private static final boolean TRACE_DES- See Also:
-
TRACE_MOD_SER
private static final boolean TRACE_MOD_SER- See Also:
-
TRACE_MOD_DES
private static final boolean TRACE_MOD_DES- See Also:
-
TRACE_STR_ARRAY
private static final boolean TRACE_STR_ARRAY- See Also:
-
srcTs
Things set up for one instance of this class -
tgtTs
-
compressLevel
-
compressStrategy
-
cas
Things for both serialization and Deserialization -
bcsd
-
stringHeapObj
-
nextFsId
private int nextFsId -
isSerializingDelta
private final boolean isSerializingDelta -
isDelta
private boolean isDelta -
isReadingDelta
private boolean isReadingDelta -
mark
-
fsStartIndexes
maps from src id <-> tgt id For deserialization: if src type not exist, tgt to src is 0 -
reuseInfoProvided
private final boolean reuseInfoProvided -
doMeasurements
private final boolean doMeasurements -
os
-
only1CommonString
private boolean only1CommonString -
isTsIncluded
private boolean isTsIncluded -
isTsiIncluded
private boolean isTsiIncluded -
typeMapper
-
isTypeMapping
private boolean isTypeMappingThis is the used version of isTypeMapping, normally == to isTypeMappingCmn But compareCASes sets this false temporarily while setting up the compare -
prevHeapInstanceWithIntValues
private final int[][] prevHeapInstanceWithIntValuesHold prev instance of FS which have non-array FSRef slots, to allow computing these to match case where a 0 value is used because of type filtering and also to allow for forward references. Note: we can't use the actual prev FS, because for type filtering, it may not exist! and even if it exists, it may not be fixed up (forward ref not yet deserialized) for each target typecode, only set if the type has 1 or more non-array fsref set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based) Also used for array refs sometimes, for the 1st entry in the array - feature slot 0 is used for this when reading (not when writing - could be made more uniform) -
prevFsWithLongValues
Hold prev values of "long" slots, by type, for instances of FS which are non-arrays containing slots which have long values, used for differencing - not using the actual FS instance, because during deserialization, these may not be deserialized due to type filtering set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based) -
foundFSs
ordered set of FSs found in indexes or linked from other found FSs. used to control loops/recursion when locating things -
foundFSsBelowMark
ordered set of FSs found in indexes or linked from other found FSs, which are below the mark. used to control loops/recursion when locating things -
fssToSerialize
FSs being serialized. For delta, just the deltas above the delta line. Constructed from indexed plus reachable, above the delta line. -
uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called. -
toBeScanned
FSs being processed, including below-the-line deltas. -
debugEOF
private final boolean debugEOF- See Also:
-
serializedOut
Things for just serialization -
sm
-
baosZipSources
-
dosZipSources
-
byte_dos
-
typeCode_dos
-
strOffset_dos
-
strLength_dos
-
float_Mantissa_Sign_dos
-
float_Exponent_dos
-
double_Mantissa_Sign_dos
-
double_Exponent_dos
-
fsIndexes_dos
-
control_dos
-
strSeg_dos
-
allowPreexistingFS
Things for just deserialization -
deserIn
-
version
private int version -
dataInputs
-
inflaters
-
fixupsNeeded
the "fixups" for relative heap refs actions set slot values -
uimaSerializableFixups
-
singleFsDefer
Deferred actions to set Feature Slots of feature structures. the deferrals needed when deserializing a subtype of AnnotationBase before the sofa is known Also for Sofa creation where some fields are final -
sofaNum
private int sofaNumused for deferred creation -
sofaName
-
sofaMimeType
-
sofaRef
-
currentFs
the FS being deserialized -
isUpdatePrevOK
private boolean isUpdatePrevOK -
readCommonString
-
arrayLength_dis
-
heapRef_dis
-
int_dis
-
byte_dis
-
short_dis
-
typeCode_dis
-
strOffset_dis
-
strLength_dis
-
long_High_dis
-
long_Low_dis
-
float_Mantissa_Sign_dis
-
float_Exponent_dis
-
double_Mantissa_Sign_dis
-
double_Exponent_dis
-
fsIndexes_dis
-
strChars_dis
-
control_dis
-
strSeg_dis
-
lastArrayLength
private int lastArrayLength
-
-
Constructor Details
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures- Parameters:
aCas
- required - refs the CAS being serialized or deserialized intomark
- if not null is the serialization mark for delta serialization. Unused for deserialization.tgtTs
- if not null is the target type system. - For serialization - this is a subset of the CASs TS - for deserialization, is the type system of the serialized data being read.rfs
- For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be nulldoMeasurements
- if true, measurements are done (on serialization)compressLevel
- if not null, specifies enum instance for compress levelcompressStrategy
- if not null, specifies enum instance for compress strategy- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
private BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, boolean storeTS, boolean storeTSI, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException - Throws:
ResourceInitializationException
-
BinaryCasSerDes6
BinaryCasSerDes6(BinaryCasSerDes6 f6, TypeSystemImpl tgtTs) throws ResourceInitializationException only called to set up for deserialization. clones existing f6, but changes the tgtTs (used to decode)- Parameters:
f6
- -tgtTs
- used for decoding- Throws:
ResourceInitializationException
- -
-
BinaryCasSerDes6
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature Structures- Parameters:
cas
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -tgtTs
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -mark
- -tgtTs
- - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - required for both delta serialization and delta deserialization- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) throws ResourceInitializationException Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements- Parameters:
cas
- -mark
- -tgtTs
- - - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - speed up on serialization, required on delta deserializationdoMeasurements
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) throws ResourceInitializationException Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -storeTS
- -storeTSI
- -- Throws:
ResourceInitializationException
- never thrown
-
-
Method Details
-
getReuseInfo
-
serialize
S E R I A L I Z E- Parameters:
out
- -- Returns:
- null or serialization measurements (depending on setting of doMeasurements)
- Throws:
IOException
- passthru
-
serializeArray
- Throws:
IOException
-
serializeByKind
serialize one feature structure, which is guaranteed not to be null guaranteed to exist in target if there is type mapping Caller iterates over target slots, but the feat arg is for the corresponding src feature- Parameters:
fs
- the FS whose slot "feat" is to be serializefeat
- the corresponding source feature slot to serialize- Throws:
IOException
-
serializeArrayLength
- Throws:
IOException
-
serializeDiffWithPrevTypeSlot
private void serializeDiffWithPrevTypeSlot(SlotKinds.SlotKind kind, TOP fs, FeatureImpl feat, int newValue) throws IOException - Throws:
IOException
-
updatePrevIntValue
Called for non-arrays- Parameters:
featOffset
- offset to the slotnewValue
- for heap refs, is the converted-from-addr-to-seq-number valuefs
- used to get the type
-
updatePrevLongValue
-
updatePrevArray0IntValue
version called for arrays, captures the 0th value- Parameters:
ti
-newValue
-
-
initPrevIntValue
Get and lazily initialize if needed the feature cache values for a type For Serializing, the type belongs to the srcTs For Deserializing, the type belongs to the tgtTs- Parameters:
ti
- the type- Returns:
- the int feature cache
-
initPrevLongValue
Get and lazily initialize if needed the long values for a type For Serializing and Deserializing, the type belongs to the tgtTs- Parameters:
ti
- the type- Returns:
- the int feature cache
-
getPrevIntValue
private int getPrevIntValue(int typeCode, int featOffset) For heaprefs this gets the previously serialized int value- Parameters:
typeCode
- the type codefeatOffset
- true offset, 1 = first feature...- Returns:
- the previous int value for use in difference calculations
-
getPrevLongValue
private long getPrevLongValue(int typeCode, int featOffset) -
collectAndZip
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams- Throws:
IOException
- passthru
-
writeLong
- Throws:
IOException
-
writeString
- Throws:
IOException
-
writeFloat
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeUnsignedByte
- Throws:
IOException
-
writeDouble
- Throws:
IOException
-
encodeIntSign
private int encodeIntSign(int v) -
writeDiff
Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta- Parameters:
kind
- selects the stream to write tov
- runs from iHeap + 3 to end of arrayprev
- for difference encoding sets isUpdatePrevOK true if ok to update prev, false if writing 0 for any reason, or max neg nbr- Throws:
IOException
- passthru
-
write0
- Throws:
IOException
-
deserialize
- Parameters:
istream
- -- Throws:
IOException
- -
-
deserialize
public void deserialize(InputStream istream, AllowPreexistingFS allowPreexistingFS) throws IOException Version used by uima-as to read delta cas from remote parallel steps- Parameters:
istream
- input streamallowPreexistingFS
- what to do if item already exists below the mark- Throws:
IOException
- passthru
-
deserializeAfterVersion
public void deserializeAfterVersion(DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS) throws IOException - Throws:
IOException
-
createCurrentFs
-
readArray
- Parameters:
storeIt
-srcType
- may be null if there's no source type for target when deserializingtgtType
- the type being deserialized- Throws:
IOException
-
getRefVal
-
readArrayLength
- Throws:
IOException
-
readByKind
private void readByKind(TOP fs, FeatureImpl tgtFeat, FeatureImpl srcFeat, boolean storeIt, TypeImpl tgtType) throws IOException - Parameters:
tgtFeat
- the Feature being readsrcFeat
- the Feature being set (may be null if the feature doesn't exist)storeIt
- false causes storing of values to be skippedThe
- feature structure to set feature value in, but may be null if it was deferred, - happens for Sofas and subtypes of AnnotationBase because those have "final" values For Sofa: these are the sofaid (String) and sofanum (int) For AnnotationBase : this is the sofaRef (and the view).- Throws:
IOException
- passthru
-
maybeStoreOrDefer
-
maybeStoreOrDefer_slotFixups
FS Ref slots fixups- Parameters:
tgtSeq
- the int value of the target seq numberr
- is sofa-or-lfs.setFeatureValue-or-setLocalSofaData(TOP ref-d-fs)
-
readIndexedFeatureStructures
process index information to re-index things- Throws:
IOException
-
readFsxPart
Each FS index is sorted, and output is by delta- Throws:
IOException
-
getInputStream
-
readVnumber
- Throws:
IOException
-
readVlong
- Throws:
IOException
-
readIntoByteArray
- Throws:
IOException
-
readIntoShortArray
- Throws:
IOException
-
readIntoLongArray
private void readIntoLongArray(long[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws IOException - Throws:
IOException
-
readIntoDoubleArray
private void readIntoDoubleArray(double[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws IOException - Throws:
IOException
-
readDiff
- Throws:
IOException
-
readDiffIntSlot
private int readDiffIntSlot(boolean storeIt, int featOffset, SlotKinds.SlotKind kind, TypeImpl tgtType) throws IOException - Throws:
IOException
-
readDiff
- Throws:
IOException
-
readLongOrDouble
- Throws:
IOException
-
skipLong
- Throws:
IOException
-
skipDouble
- Throws:
IOException
-
readFloat
- Throws:
IOException
-
decodeIntSign
private int decodeIntSign(int v) -
readDouble
- Throws:
IOException
-
decodeDouble
private long decodeDouble(long mants, int exponent) -
readVlong
- Throws:
IOException
-
readString
- Parameters:
storeIt
- true to store value, false to skip it- Returns:
- the string
- Throws:
IOException
-
skipBytes
- Throws:
IOException
-
processIndexedFeatureStructures
- Throws:
IOException
-
processFSsForView
processes one view's worth of feature structures- Parameters:
fsIndexes
-fsNdxStart
-isDoingEnqueue
-isWrite
-- Throws:
IOException
-
enqueueFS
Add Fs to toBeProcessed and set foundxxx bit - skip this if doesn't exist in target type system- Parameters:
fs
-
-
isTypeInTgt
-
initSrcTgtIdMapsAndStrings
private void initSrcTgtIdMapsAndStrings()Serializing: Called at beginning of serialize, scans whole CAS or just delta CAS If doing delta serialization, fsStartIndexes is passed in, pre-initialized with a copy of the map info below the line. -
addStringsFromFS
Add all the strings ref'd by this FS. - if it is a string array, do all the array items - else scan the features and do all string-valued features, in feature offset order For delta, this isn't done here - another routine driven by FsChange info does this. -
compareCASes
Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs- Parameters:
c1
- CAS to comparec2
- CAS to compare- Returns:
- true if equal (for types / features in both)
-
makeDataOutputStream
- Parameters:
f
- can be a DataOutputStream, an OutputStream a File- Returns:
- a data output stream
- Throws:
FileNotFoundException
- passthru
-
setupOutputStreams
Set up Streams- Throws:
FileNotFoundException
- passthru
-
setupOutputStreams
static void setupOutputStreams(CASImpl cas, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources) -
setupOutputStream
private static void setupOutputStream(int i, int size, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources) -
setupReadStreams
- Throws:
IOException
-
setupReadStream
- Throws:
IOException
-
closeDataInputs
private void closeDataInputs() -
readHeader
HEADERS- Throws:
IOException
- passthru
-
writeStringInfo
- Throws:
IOException
-
getTgtSeqFromSrcFS
For Serialization only. Map src FS to tgt seq number: fs == null -> 0 type not in target -> 0 map src fs._id to tgt seq- Parameters:
fs
-- Returns:
- 0 or the mapped src id
-
getTgtTs
TypeSystemImpl getTgtTs()
-