Saturday, June 22, 2013

PDF To Text Converter

The PDFToTextConverter program can be used to convert a PDF file in to a text file. When the program runs you can selected one or many PDF files for converting. Then wait for a while. The amount of time waiting depends mainly on the number of PDF files you selected and each file size. It is noted that a PDF file that does not have text can not be converted.


PDF To Text Converter

PDFToTextConverter source code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}


//allow pdf files selection for converting
public static void selectPDFFiles(){

JFileChooser chooser = new JFileChooser();
    FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
    chooser.setFileFilter(filter);
    chooser.setMultiSelectionEnabled(true);
    int returnVal = chooser.showOpenDialog(null);
    if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
            for( int i=0;i<Files.length;i++){    
            convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
            }
System.out.println("Conversion complete");
            }

     
}

public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();

}
bw.flush();
bw.close();



}catch(Exception e){e.printStackTrace();}

}

}

Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class.  This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.


23 comments:

  1. I have used Aspose.PDF for .NET API to convert my text files to pdf and it has produced very good result exactly what i wanted and you can even convert pdf files to text also even if your files are large in size. Try this API, i hope you will like it also.

    http://www.aspose.com/java/pdf-component.aspx

    ReplyDelete
    Replies
    1. is it possible to read hindi text also...
      as it is.

      Delete
    2. Is it possible if the file convert from .DCM file to PDF and then to .Txt delimiter

      Delete
  2. Very useful information for beginners like me.Thank you very much..

    ReplyDelete
  3. Hello, if interested, for pdf conversion to text format, you can also check out this free toolkit with more options available. Just upload your needed pdf doc and convert it in a plain text. If you want to try and see how it works, see here: http://kitpdf.com/pdf_to_text/ . Maybe it's useful for you.

    ReplyDelete
  4. Replies
    1. Thx! It was useful as first approach. it is easy to know convert multi page pdf to single jpg. This Website says that convert multi page pdf to single jpg pages is also possible http://www.rasteredge.com/online/pdf/convert-pdf-to-jpeg/.

      Delete
  5. you can try this free online pdf to text converter to convert pdf to text online.

    ReplyDelete
  6. I read your post and need to thank you for sharing such pleasant lines. Buzz Applications is a combination of multiple services

    ReplyDelete
  7. thank u for code but i want to know where this file is being stored after conversion.....

    ReplyDelete
  8. Documento vazio, foi o resultado...

    ReplyDelete
  9. Documento vazio, foi o resultado final...

    ReplyDelete
  10. Empty Document, the result was final ...

    ReplyDelete
  11. I have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. I like play game five nights at freddy’s 4, game word cookies game , game hill climb racing 2 , hotmail login, and u? I hope people visit my website.

    ReplyDelete
  12. It is showing error at com at import line

    ReplyDelete
  13. Irrespective of code also provide details of jar so that it become easy for us.

    ReplyDelete
  14. NaturalReader Free is an exceptionally helpful program to change over composed text (MSWord, Webpage, PDF records, messages) in sound documents (MP3, WAV or CD) or in oral speech. text to speech

    ReplyDelete