Class COSDocument

java.lang.Object
org.apache.pdfbox.cos.COSBase
org.apache.pdfbox.cos.COSDocument
All Implemented Interfaces:
Closeable, AutoCloseable, COSObjectable

public class COSDocument extends COSBase implements Closeable
This is the in-memory representation of the PDF document. You need to call close() on this object when you are done using it!!
Author:
Ben Litchfield
  • Constructor Details

    • COSDocument

      public COSDocument()
      Constructor. Uses main memory to buffer PDF streams.
    • COSDocument

      public COSDocument(ScratchFile scratchFile)
      Constructor that will use the provide memory handler for storage of the PDF streams.
      Parameters:
      scratchFile - memory handler for buffering of PDF streams
  • Method Details

    • createCOSStream

      public COSStream createCOSStream()
      Creates a new COSStream using the current configuration for scratch files.
      Returns:
      the new COSStream
    • createCOSStream

      public COSStream createCOSStream(COSDictionary dictionary)
      Creates a new COSStream using the current configuration for scratch files. Not for public use. Only COSParser should call this method.
      Parameters:
      dictionary - the corresponding dictionary
      Returns:
      the new COSStream
    • getObjectByType

      public COSObject getObjectByType(COSName type) throws IOException
      This will get the first dictionary object by type.
      Parameters:
      type - The type of the object.
      Returns:
      This will return an object with the specified type.
      Throws:
      IOException - If there is an error getting the object
    • getObjectsByType

      public List<COSObject> getObjectsByType(String type) throws IOException
      This will get all dictionary objects by type.
      Parameters:
      type - The type of the object.
      Returns:
      This will return an object with the specified type.
      Throws:
      IOException - If there is an error getting the object
    • getObjectsByType

      public List<COSObject> getObjectsByType(COSName type) throws IOException
      This will get a dictionary object by type.
      Parameters:
      type - The type of the object.
      Returns:
      This will return an object with the specified type.
      Throws:
      IOException - If there is an error getting the object
    • getKey

      public COSObjectKey getKey(COSBase object)
      Returns the COSObjectKey for a given COS object, or null if there is none. This lookup iterates over all objects in a PDF, which may be slow for large files.
      Parameters:
      object - COS object
      Returns:
      key
    • print

      public void print()
      This will print contents to stdout.
    • setVersion

      public void setVersion(float versionValue)
      This will set the header version of this PDF document.
      Parameters:
      versionValue - The version of the PDF document.
    • getVersion

      public float getVersion()
      This will get the version extracted from the header of this PDF document.
      Returns:
      The header version.
    • setDecrypted

      public void setDecrypted()
      Signals that the document is decrypted completely.
    • isDecrypted

      public boolean isDecrypted()
      Indicates if a encrypted pdf is already decrypted after parsing.
      Returns:
      true indicates that the pdf is decrypted.
    • isEncrypted

      public boolean isEncrypted()
      This will tell if this is an encrypted document.
      Returns:
      true If this document is encrypted.
    • getEncryptionDictionary

      public COSDictionary getEncryptionDictionary()
      This will get the encryption dictionary if the document is encrypted or null if the document is not encrypted.
      Returns:
      The encryption dictionary.
    • setEncryptionDictionary

      public void setEncryptionDictionary(COSDictionary encDictionary)
      This will set the encryption dictionary, this should only be called when encrypting the document.
      Parameters:
      encDictionary - The encryption dictionary.
    • getDocumentID

      public COSArray getDocumentID()
      This will get the document ID.
      Returns:
      The document id.
    • setDocumentID

      public void setDocumentID(COSArray id)
      This will set the document ID.
      Parameters:
      id - The document id.
    • getCatalog

      public COSObject getCatalog() throws IOException
      Deprecated.
      This will get the document catalog.
      Returns:
      The catalog is the root of the document; never null.
      Throws:
      IOException - If no catalog can be found.
    • getObjects

      public List<COSObject> getObjects()
      This will get a list of all available objects. This method works only for loaded PDFs. It will return an empty list for PDFs created from scratch (this includes PDFs generated within PDFBox, e.g. by Splitter). This method will be removed in 3.0.
      Returns:
      A list of all objects, never null.
    • getTrailer

      public COSDictionary getTrailer()
      This will get the document trailer.
      Returns:
      the document trailer dict
    • setTrailer

      public void setTrailer(COSDictionary newTrailer)
      // MIT added, maybe this should not be supported as trailer is a persistence construct. This will set the document trailer.
      Parameters:
      newTrailer - the document trailer dictionary
    • getHighestXRefObjectNumber

      public long getHighestXRefObjectNumber()
      Internal PDFBox use only. Get the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.
      Returns:
      The object number of the highest XRef stream, or 0 if there was no XRef stream.
    • setHighestXRefObjectNumber

      public void setHighestXRefObjectNumber(long highestXRefObjectNumber)
      Internal PDFBox use only. Sets the object number of the highest XRef stream. This is needed to avoid reusing such a number in incremental saving.
      Parameters:
      highestXRefObjectNumber - The object number of the highest XRef stream.
    • accept

      public Object accept(ICOSVisitor visitor) throws IOException
      visitor pattern double dispatch method.
      Specified by:
      accept in class COSBase
      Parameters:
      visitor - The object to notify when visiting this object.
      Returns:
      any object, depending on the visitor implementation, or null
      Throws:
      IOException - If an error occurs while visiting this object.
    • close

      public void close() throws IOException
      This will close all storage and delete the tmp files.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - If there is an error close resources.
    • isClosed

      public boolean isClosed()
      Returns true if this document has been closed.
      Returns:
      true if the document has been closed.
    • finalize

      protected void finalize() throws IOException
      Warn the user in the finalizer if he didn't close the PDF document. The method also closes the document just in case, to avoid abandoned temporary files. It's still a good idea for the user to close the PDF document at the earliest possible to conserve resources.
      Overrides:
      finalize in class Object
      Throws:
      IOException - if an error occurs while closing the temporary files
    • setWarnMissingClose

      public void setWarnMissingClose(boolean warn)
      Controls whether this instance shall issue a warning if the PDF document wasn't closed properly through a call to the close() method. If the PDF document is held in a cache governed by soft references it is impossible to reliably close the document before the warning is raised. By default, the warning is enabled.
      Parameters:
      warn - true enables the warning, false disables it.
    • dereferenceObjectStreams

      public void dereferenceObjectStreams() throws IOException
      This method will search the list of objects for types of ObjStm. If it finds them then it will parse out all of the objects from the stream that is contains.
      Throws:
      IOException - If there is an error parsing the stream.
    • getObjectFromPool

      public COSObject getObjectFromPool(COSObjectKey key) throws IOException
      This will get an object from the pool.
      Parameters:
      key - The object key.
      Returns:
      The object in the pool or a new one if it has not been parsed yet.
      Throws:
      IOException - If there is an error getting the proxy object.
    • removeObject

      public COSObject removeObject(COSObjectKey key)
      Removes an object from the object pool.
      Parameters:
      key - the object key
      Returns:
      the object that was removed or null if the object was not found
    • addXRefTable

      public void addXRefTable(Map<COSObjectKey,Long> xrefTableValues)
      Populate XRef HashMap with given values. Each entry maps ObjectKeys to byte offsets in the file.
      Parameters:
      xrefTableValues - xref table entries to be added
    • getXrefTable

      public Map<COSObjectKey,Long> getXrefTable()
      Returns the xrefTable which is a mapping of ObjectKeys to byte offsets in the file.
      Returns:
      mapping of ObjectsKeys to byte offsets
    • setStartXref

      public void setStartXref(long startXrefValue)
      This method set the startxref value of the document. This will only be needed for incremental updates.
      Parameters:
      startXrefValue - the value for startXref
    • getStartXref

      public long getStartXref()
      Return the startXref Position of the parsed document. This will only be needed for incremental updates.
      Returns:
      a long with the old position of the startxref
    • isXRefStream

      public boolean isXRefStream()
      Determines if the trailer is a XRef stream or not.
      Returns:
      true if the trailer is a XRef stream
    • setIsXRefStream

      public void setIsXRefStream(boolean isXRefStreamValue)
      Sets isXRefStream to the given value. You need to take care that the version of your PDF is 1.5 or higher.
      Parameters:
      isXRefStreamValue - the new value for isXRefStream