In this post I will discuss a method to generate the frequency table of all alphabets in a string in Java. Frequency table means a table which enumerates or lists all alphabets along with the number of times each alphabet is occurring in the string. We will not make any distinction between uppercase and lowercase characters i.e. if the string is “Allahabad” the frequency of alphabet ‘A’ is 4 and not ‘A’ is 1 or ‘a’ is 3.
The method which usually comes first in the minds of students is to calculate frequency of each alphabet by using nested loops—the outer loops iterates for all the alphabets and the inner loop iterates for all the characters of the given string; a counter is initialized to zero before the start of the inner loop and the counter is incremented inside the loop whenever a character extracted from the string is equal to the concerned alphabet. The Java implementation of the above mentioned logic will be as follows:
class Frequency{ public static void showFrequencyTable(String input){ input=input.toUpperCase(); int len=input.length(); int count; System.out.println("Frequency table for: " + input); for(char ch='A';ch<='Z'; ch++){ count=0; for(int i=0; i<len; i++){ if(ch==input.charAt(i)) count++; } if(count>0){ System.out.println(ch+": "+ count); } } } public static void main(String args[]){ showFrequencyTable("Allahabad"); showFrequencyTable("The quick brown fox jump over the lazy dog."); } }
The condition count>0 at line number 12 is for aesthetic purpose only i.e. we are suppressing lines in which the alphabet count is 0. The output of the above program is as follows:
Frequency table for: ALLAHABAD
A: 4
B: 1
D: 1
H: 1
L: 2
Frequency table for: THE QUICK BROWN FOX JUMP OVER THE LAZY DOG.
A: 1
B: 1
C: 1
D: 1
E: 3
F: 1
G: 1
H: 2
I: 1
J: 1
K: 1
L: 1
M: 1
N: 1
O: 4
P: 1
Q: 1
R: 2
T: 2
U: 2
V: 1
W: 1
X: 1
Y: 1
Z: 1
The output is correct and the above method is usually acceptable at class 9-10 level i.e. ICSE but if the same question is asked in class 11-12 i.e. ISC, one can use an array to improve the performance of the program. The above method using nested loop requires 65 times the number of alphabets in the input string. If the string is of 100 characters the loop will iterate 6500 times which is a huge number.
A more efficient method would be to take an array of size 26 for each alphabet of the English language. The idea is to store the frequency of ‘A’ at array index 0, the frequency of ‘B’ at array index 1 and so on. This concept of sacrificing some memory to speed up some operation is known as ‘space-time trade-off’ in computer science. The Java program incorporating this enhancement is as follows:
class Frequency{ public static void showFrequencyTable(String input){ input=input.toUpperCase(); int len=input.length(); int freq[] = new int[26]; char ch; for(int i=0; i<len; i++){ ch=input.charAt(i); if(Character.isLetter(ch)){ freq[ch-'A']++; } } System.out.println("Frequency table for: " + input); for(int i=0; i<freq.length; i++){ if(freq[i]>0){ System.out.println((char)(65+i)+": "+ freq[i]); } } } public static void main(String args[]){ showFrequencyTable("Allahabad"); showFrequencyTable("The quick brown fox jump over the lazy dog."); } }
The above code is first converting the string input to uppercase so that there is no distinction between uppercase and lower character as in the previous program. A loop is taken to all the characters one by one. Each extracted character is tested with isLetter() to ensure that only letter(alphabets) are being counted. The line freq[ch-‘A’]++ deserves some illustration—lets say the string is “Allahabad”. On conversion to uppercase it will be changed to “ALLAHABAD”. The first value of ch would be ‘A’. Now the index of array will be evaluate i.e. ch -‘A’ would be ‘A’ – ‘A’, code for A is 65 and hence ‘A’ – ‘A’ would be 0. This will result in freq[0]++. This way the array element corresponding to letter ‘A’ would be incremented by 1. The reader can confirm by dry run that the same will happen for other letters also. That is in case of ‘L’ the eleventh index of array freq will be incremented and so. In this logic the number of times the loop iterates would be equal to the number of characters in the input string. If the length of the string is 100 loop the main loop will iterate 100 times and the loop used for generating the output will always iterate 26 times. Thus, this method is computationally more efficient compared to the previous method which employed nested loops.
The main function has purposely been kept to minimum and its expected that students will write appropriate input/output statements as per the requirement.