PDFToTextConverter source code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}
//allow pdf files selection for converting
public static void selectPDFFiles(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}
}
public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();
}
bw.flush();
bw.close();
}catch(Exception e){e.printStackTrace();}
}
}
Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class. This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}
//allow pdf files selection for converting
public static void selectPDFFiles(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}
}
public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();
}
bw.flush();
bw.close();
}catch(Exception e){e.printStackTrace();}
}
}
Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class. This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.
I have used Aspose.PDF for .NET API to convert my text files to pdf and it has produced very good result exactly what i wanted and you can even convert pdf files to text also even if your files are large in size. Try this API, i hope you will like it also.
ReplyDeletehttp://www.aspose.com/java/pdf-component.aspx
is it possible to read hindi text also...
Deleteas it is.
Is it possible if the file convert from .DCM file to PDF and then to .Txt delimiter
DeleteThank you for sharing.
ReplyDeleteVery useful information for beginners like me.Thank you very much..
ReplyDeleteHello, if interested, for pdf conversion to text format, you can also check out this free toolkit with more options available. Just upload your needed pdf doc and convert it in a plain text. If you want to try and see how it works, see here: http://kitpdf.com/pdf_to_text/ . Maybe it's useful for you.
ReplyDeleteso thank you , good document
ReplyDeleteThx! It was useful as first approach. it is easy to know convert multi page pdf to single jpg. This Website says that convert multi page pdf to single jpg pages is also possible http://www.rasteredge.com/online/pdf/convert-pdf-to-jpeg/.
Deleteyou can try this free online pdf to text converter to convert pdf to text online.
ReplyDelete.NET PDF To Text Converter: convert a PDF file in to a text file
ReplyDeleteI read your post and need to thank you for sharing such pleasant lines. Buzz Applications is a combination of multiple services
ReplyDeletethank u for code but i want to know where this file is being stored after conversion.....
ReplyDeleteYou shared very useful post. Thanks for sharing.
ReplyDeleteMagento Development in Chennai
Good post. Keep sharing such a useful post.
ReplyDeleteMagento eCommerce Website Development
HAPPY
ReplyDeleteDocumento vazio, foi o resultado...
ReplyDeleteDocumento vazio, foi o resultado final...
ReplyDeleteEmpty Document, the result was final ...
ReplyDeleteCan you plz eexpalin the code
ReplyDeleteI have been searching out for this similar kind of post for past a week and hardly came across this. Thank you very much and will look for more postings from you. I like play game five nights at freddy’s 4, game word cookies game , game hill climb racing 2 , hotmail login, and u? I hope people visit my website.
ReplyDeleteIt is showing error at com at import line
ReplyDeleteIrrespective of code also provide details of jar so that it become easy for us.
ReplyDeleteNaturalReader Free is an exceptionally helpful program to change over composed text (MSWord, Webpage, PDF records, messages) in sound documents (MP3, WAV or CD) or in oral speech. text to speech
ReplyDeleteThanks for sharing this useful article, keep posting more like this.
ReplyDeleteEcommerce Web Development
Online Shop Builder