------------------ Apache Lucene Tika ------------------ "Toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries." For more information: http://lucene.apache.org/tika/