Huffman Coding Java17 Mar 2025 | 9 min read The Huffman Coding Algorithm was proposed by David A. Huffman in 1950. It is a lossless data compression mechanism. It is also known as data compression encoding. It is widely used in image (JPEG or JPG) compression. In this section, we will discuss the Huffman encoding and decoding, and also implement its algorithm in a Java program. We know that each character is a sequence of 0's and 1's and stores using 8-bits. The mechanism is called fixed-length encoding because each character uses the same number of fixed-bit storage. Here, a question ascends, is it possible to reduce the amount of space required to store a character? Yes, it is possible by using variable-length encoding. In this mechanism, we exploit some characters that occur more frequently in comparison to other characters. In this encoding technique, we can represent the same piece of text or string by reducing the number of bits. Huffman EncodingHuffman encoding implements the following steps.
Huffman coding follows a prefix rule that prevents ambiguity while decoding. The rule also ensures that the code assigned to the character is not treated as a prefix of the code assigned to any other character. There are the following two major steps involved in Huffman coding:
Let's brief the above two steps. Huffman TreeStep 1: For each character of the node, create a leaf node. The leaf node of a character contains the frequency of that character. Step 2: Set all the nodes in sorted order according to their frequency. Step 3: There may exist a condition in which two nodes may have the same frequency. In such a case, do the following:
Step 4: Repeat step 2 and 3 until all the node forms a single tree. Thus, we get a Huffman tree. Huffman Encoding ExampleSuppose, we have to encode string abracadabra. Determine the following:
(i) Huffman Code for All the CharactersIn order to determine the code for each character, first, we construct a Huffman tree. Step 1: Make pairs of characters and their frequencies. (a, 5), (b, 2), (c, 1), (d, 1), (r, 2) Step 2: Sort pairs with respect to frequency, we get: (c, 1), (d, 1), (b, 2) (r, 2), (a, 5) Step 3: Pick the first two characters and join them under a parent node. ![]() We observe that a parent node does not have a frequency so, we must assign a frequency to it. The parent node frequency will be the sum of its child nodes (left and right) i.e. 1+1=2. ![]() Step 4: Repeat Steps 2 and 3 until, we get a single tree. We observe that the pairs are already in a sorted (by step 2) manner. Again, pick the first two pairs and join them. ![]() We observe that a parent node does not has a frequency so, we must assign a frequency to it. The parent node frequency will be the sum of its child nodes (left and right) i.e. 2+2=4. ![]() Again, we check if the pairs are in a sorted manner or not. At this step, we need to sort the pairs. ![]() According to step 3, pick the first two pairs and join them, we get: ![]() We observe that a parent node does not have a frequency so, we must assign a frequency to it. The parent node frequency will be the sum of its child nodes (left and right) i.e. 2+4=6. ![]() Again, we check if the pairs are in a sorted manner or not. At this step, we need to sort the pairs. After sorting the tree looks like the following: ![]() According to step 3, pick the first two pairs and join them, we get: ![]() We observe that a parent node does not have a frequency so, we must assign a frequency to it. The parent node frequency will be the sum of its child nodes (left and right) i.e. 5+6=11. ![]() Therefore, we get a single tree. At last, we will find the code for each character with the help of the above tree. Assign a weight to each edge. Note that each left edge-weighted is 0 and the right edge-weighted is 1. ![]() We observe that input characters are only presented in the leave nodes and the internal nodes have null values. In order to find the Huffman code for each character, traverse over the Huffman tree from the root node to the leaf node of that particular character for which we want to find code. The table describes the code and code length for each character.
We observe that the most frequent character gets the shortest code length and the less frequent character gets the largest code length. Now we can encode the string (abracadabra) that we have taken above. (ii) Average Code Length for the StringThe average code length of the Huffman tree can be determined by using the formula given below: = { (5 x 1) + (2 x 3) + (1 x 4) + (1 x 4) + (2 x 2) } / (5+2+1+1+2) = 2.09090909 (iii) Length of the Encoded StringThe length of the encoded message can be determined by using the following formula: = 11 x 2.09090909 = 23 bits Huffman Encoding AlgorithmThe Huffman algorithm is a greedy algorithm. Since at every stage the algorithm looks for the best available options. The time complexity of the Huffman encoding is O(nlogn). Where n is the number of characters in the given text. Huffman DecodingHuffman decoding is a technique that converts the encoded data into initial data. As we have seen in encoding, the Huffman tree is made for an input string and the characters are decoded based on their position in the tree. The decoding process is as follows:
The child node holds the input character. It is assigned the code formed by subsequent 0s and 1s. The time complexity of decoding a string is O(n), where n is the length of the string. Huffman Encoding and Decoding Java ProgramIn the following program, we have used data structures like priority queues, stacks, and trees to design a compression and decompression logic. We will base our utilities on the widely used algorithmic technique of Huffman coding. HuffmanCode.java Output: Huffman Codes of the characters are: {p=000, a=110, t=111, v=001, i=010, j=011, n=100, o=101} The initial string is: javatpoint The encoded string is: 011110001110111000101010100111 The decoded string is: javatpoint Next TopicJava Snippet Class |
Variables are an essential part of data storage and manipulation in the realm of programming. In addition to making values available within a programme, they offer a means of holding them temporarily. Not all variables, though, are made equally. Each variable has a scope that specifies...
3 min read
What is an ArrayList? In Java, ArrayList is a resizable array implementation. ArrayList expands dynamically, ensuring that there is always room for more elements to be added. An array of the Object class serves as the ArrayList's underpinning data structure. In Java, there are three constructors for...
4 min read
A full-stack developer is a person who can develop application's backend and frontend. Java full-stack is basically a term used for a web developer that uses Java to develop the entire technology stack is referred to as Java full stack developer. A developer should have the following...
8 min read
In Java, BLOB and CLOB are the two data types used to store binary and character large objects, respectively. It is different from other data types like float, int, double, etc. Collectively it refers to as LOB (Large Objects). In this section, we will discuss the BLOB...
4 min read
Finding the intersection of arrays with distinct elements in Java involves identifying common elements shared by two or more arrays. Since the elements are unique within each array, the task simplifies to efficiently comparing sets. This process is useful in various applications like data filtering, set...
8 min read
Java and .NET are two of the most prominent development platforms used for building a variety of applications. Both have their strengths and are chosen based on the specific needs of a project. Below is a detailed comparison of Java and .NET. Overview of Java and...
4 min read
Java SortedSet<E> Interface The SortedSet<E> interface in Java is part of the Java Collections Framework and provides a collection of unique elements, where the elements are stored in sorted order. It extends the Set<E> interface. It was introduced in Java 2 and has been an essential...
9 min read
An input array inputArr[] is given to us that contains n numbers. Our task is to find the minimum difference between the two sub-arrays. The sub-arrays are made from the given input array. If an element is a part of one sub-array, then it can not...
8 min read
Java does not support multiple inheritance with classes to avoid the diamond problem, which causes ambiguity when multiple parents provide methods with the same signature. However, with the introduction of default methods in Java 8, multiple inheritance is supported using interfaces. While this enhances flexibility, conflicts...
6 min read
The ORE number is a special positive number which is also known as a harmonic divisor number. A number is said to be ORE if its divisors have a harmonic value as an integer. Unlike Prime and Armstrong numbers, the ORE number is rarely asked by...
7 min read
We request you to subscribe our newsletter for upcoming updates.
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India