Sunday, October 6, 2013

Count words frequency

This is a simple program to count words frequency in an input file. In this program, there are two classes--Counter and CountWordsFrequency class. In the Counter class, two data structures are implemented--LinkedList and TreeMap. The Counter class has four methods. The first method, readWords() is called to read content from the input file and split the content in to words. Before adding these words to the LinkedList, the regular express is used to filter words by removing all symbols characters from the words. The Pattern class is used to define the pattern to be matched. The string pattern "\\W+" matches any symbol except underscore in a word. The Matcher class is able to remove the symbols from the words that match the string pattern. The second method is called countWords(). This method uses two loops to process all words and count the words frequency. The TreeMap is used to store the unique words and their frequencies. The words are stored automatically in TreeMap. The addToMap method is called by the countWords method to add words and frequencies to the TreeMap. The showResult method is invoked after the words and frequencies are added to the TreeMap to show the table of the words , frequencies, and the percentages.

java program to count words frequency

Its final method, processCounting combines the methods above in a single code block. This method will be invoked from the CounterWordsFreqency class to start analyzing the content of the input file and show the words frequency table. Below is the source code of the CountWordsFreqency program.

import java.util.Iterator;
import java.util.LinkedList;
import java.util.Set;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Counter{
private String filename;
private LinkedList<String> keywordsList;
private TreeMap<String, Integer> freqMap;

Counter(String filename){
freqMap=new TreeMap<String,Integer>();
keywordsList=new LinkedList<String>();

public void readWords(){
Pattern pattern=Pattern.compile("\\W+");
try {

FileReader fr=new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String strLine;
//split a line by spaces so we get words
String[] words=strLine.split("[ ]+");
for(String word:words){
//remove all symbols except underscore
Matcher mat=pattern.matcher(word);
//add words to the list

} catch (Exception e) {
// TODO Auto-generated catch block

public void countWords(){
int count=1;
String word="";
for(int i=0;i<keywordsList.size();i++){
for(int j=i+1;j<keywordsList.size();j++){
count++; //increase the number of duplicate words
//add the word and its frequency to the TreeMap
//reset the count variable


public void addToMap(String word, int count){
//place keyword and its frequency in TreeMap
if(!freqMap.containsKey(word) && word.length()>=1){
freqMap.put(word, count);


public void showResult(){
Set<String> keys=freqMap.keySet();
int numWord=keys.size();
Iterator<String> iterator=keys.iterator();
int count=freqMap.get(word);
System.out.format("%-20s%-5d%-2s\n", word,count,100*count/numWord+"%");


public void processCounting(){
Thread backprocess=new Thread(){
public void run(){


public class CountWordsFrequency{

public static void main(String[] args){
Counter counter=new Counter(args[0]);
System.out.println("No such file name");


compass app flashLight app


  1. Hadoop is so much better than this :)

  2. Need To Boost Your ClickBank Banner Traffic And Commissions?

    Bannerizer makes it easy for you to promote ClickBank products using banners, simply go to Bannerizer, and get the banner codes for your chosen ClickBank products or use the Universal ClickBank Banner Rotator to promote all of the ClickBank products.

  3. I’m not sure where you are getting your info, but good topic. I needs to spend some time learning more or understanding more. Thanks for excellent info I was looking for this info for my mission. SEO and its Importance