Tech Ads

Back to Article List

Originally published December 2003 [ Publisher Link ]

Java Collections Framework : Grouping Java Objects


Grouping objects is possibly the most common task a software developer will perform on a day-to-day basis. From names to quantities, we have all grappled with the task of placing various values in a "capsule" so they are easier to work with. From the universal language-agnostic array introduced in programming 101 to the more cryptic structures used under certain circumstances, every programming language has a mechanism for dealing with this task.

In the Java language simply adding a pair of brackets ([ ]) is enough to define an array. For the occasions when this structure proves unfit, as in the case for a disparate group of objects or an indeterminate number of values at run-time, the Vector class can serve as a placeholder. For times when adding an associative key to each element is a necessity, the HashTable class can generally be employed.

Although the aforementioned structures probably cover most of your needs for grouping elements in Java, they are often cumbersome when you're trying to perform routine tasks like iterating or sorting values, and they're unwieldy when you're trying to mix structures amongst each other.

To help in such situations, Java's designers created what is now known as the Java Collections Framework, which offers a unified approach to working with object groups in Java. Of such importance is this novel approach to an enterprise developer that it was recently introduced as an exam objective in the Java Programmer 1.4 certification track.

The Collections Framework has three main tenets:

  • Interface Hierarchy: Since a hierarchy implies a common relation between all of its members, this allows easy interaction between different grouping structures. The Collections Framework has two main interface hierarchies: a Collection Hierarchy for grouping single values based on the Collection interface, and a Map Hierarchy for key-value groups with its parent interface Map.
  • Class Hierarchy: Since an interface provides only a design contract, the Collections Framework also provides a concrete class hierarchy which is in direct mapping to the Interface Hierarchy, as we'll see in the following sections.
  • Common Algorithms: Through the Collections Framework you are guaranteed a common set of operations on all your grouping elements, simplifying the learning curve for manipulating your elements. This is of course a derivative of the Hierarchy structures mentioned previously.

Collection Hierarchy: an array and vector alternative

The Collection Hierarchy provides a unified way to deal with single value groups. In essence it provides a way to deal with elements in the same manner you would expect with an array or vector. The following table describes the interface and (class) hierarchy that conform this part of the Java Collections Framework.

Collection
Set (HashSet / LinkedHashSet )List (ArrayList / LinkedList)
SortedSet (TreeSet)
  • Collection: It represents the root interface for aggregating single value elements in the Java Collections Framework and does not have any Class implementation associated with it. This structure has no restriction regarding the type of elements which can be placed in it.
  • Set: This interface is used for cases when no duplicate elements can be present in the grouping structure. It has two implementation classes associated with it:

    • HashSet: Is a vanilla implementation of the Set interface allowing any non-duplicate element to be present in its instance.
    • LinkedHashSet: Differs from the HashSet class in that it guarantees the initial insertion-order on later operations performed on the Class. This behaviour is achieved through a doubly-linked list maintained on each element, thus the name of the class.
  • List: This interface does allow duplicate elements within its members and has two classes which adopt its functionality:

    • ArrayList: Provides a re-sizable array implementation of the List interface, similar in functionality to a vector.
    • LinkedList: Similar to its counterpart LinkedHashSet with the exception that this class does allow duplicates, as would be expected from a List child.
  • SortedSet: As a sub-interface to Set it too does not allow duplicate elements in its content, but it is a special kind of Set in that it naturally orders its elements; it other words, it guarantees that every operation on it will return its elements in ascending order. The TreeSet class is an implementation of this interface.

The Collection Hierarchy illustrated

Now that we have covered the first half of the Java Collections Framework in theory, let's do a step-by-step code walk-through with the Collection Hierarchy.

We start with a list of values in a typical array structure and enter them in the Java Collections Framework:

String[] proposedSubjects = {
 "Linux News","PHP","Java","XML","Perl",
 "Linux Development","PHP","XML","Web-Services",
 "Linux Tools", "IT Management", "Security" ,
 "Java", "J2EE", "XML" };

// Creating a List from the previous Array
// and placing it in the Collection Interface
Collection convertedSubjects = 
                             Arrays.asList(proposedSubjects);

Having defined a Collection, we can now place it in a specific Framework class. The following code places the Collection in an ArrayList class instance:

// Creating an ArrayList instance with the previous Collection
// and placing it in the List Interface
List osdnSubjects = new ArrayList(convertedSubjects);

// At this point osdnSubjects contains the following
// [Linux News, PHP, Java, XML, Perl,
//  Linux Development, PHP, XML, Web-Services,
//  Linux Tools, IT Management, Security,
//  Java, J2EE, XML]

// Counting the elements on the HashSet which would be 15
System.out.println("We have " + 
             osdnSubjects.size()  + " proposals for OSDN");

Notice that the ArrayList instance is assigned to its natural hierarchy partner interface List. So far so good, but after inspecting the contents of your List you notice that it has duplicate values. How could you easily eliminate all duplicates from your List? A Set would be a natural choice:

// Creating a HashSet instance with the previous List
// and placing it in the Set Interface
Set commonOsdnSubjects = new HashSet(osdnSubjects);

// At this point commonOsdnSubjects would print the following
// [Linux News, Linux Tools, Security, Java,
//  Web-Services, PHP, J2EE, IT Management,
//  Perl, XML, Linux Development]

// Counting the elements on the HashSet which would be 11
System.out.println("We have " + 
  commonOsdnSubjects.size()  + " common proposals for OSDN");

First off, realize that we were able to use the osdnSubjects object to create a HashSet even though osdnSubjects is a List. This is possible because both are inherited from the Collection interface. If you were to print out this structure you would notice that every duplicate element placed in commonOsdnSubjects is automatically dropped because of Set's nature.

In both the previous code snippets you were also able to use the size() method, which is one of the common algorithms provisioned in the Java Collections Framework.

So far we have just begun to explore the power of using the Java Collections Framework. In the next block of code we will automatically sort every element in alphabetical order, perform an iteration over all its elements, and compare the contents of each structure in a single operation.

The following fragment illustrates all these functionalities through the SortedSet interface using the TreeSet class:


// Creating a TreeSet instance with the initial List
// and placing it in the SortedSet Interface
SortedSet sortedCommonOsdnSubjects = 
                                   new TreeSet(osdnSubjects);

// At this point sortedCommonOsdnSubjects would print the
// following [IT Management, J2EE, Java, Linux Development,
//  Linux News, Linux Tools, PHP, Perl,
//  Security, Web-Services, XML]

// Lets just check if no subjects have been mishandled from 
// the start. The following statment would result true
boolean alliswell = 
          sortedCommonOsdnSubjects.containsAll(osdnSubjects);

// Print the result of the comparison
System.out.println("Do we still have every subject 
                         from the start : " + alliswell);

// Print everything out with an Iterator
for (Iterator it = 
           sortedCommonOsdnSubjects.iterator(); it.hasNext(); ) {
    System.out.println(it.next());
}

We start here defining a TreeSet instance with the initial List and placing it in the corresponding SortedSet interface. SortedSet is comparable to the Set interface defined earlier, but through the SortedSet interface not only is every duplicate element in the original osdnSubjects object dropped, but also every element is sorted in alphabetical order.

The containsAll method is another algorithm which allows us to compare the contents of any Collections Framework structure in a single line. In this case we are verifying that every element present in the newly created sortedCommonOsdnSubjects object is also present in the original osdnSubjects List.

Finishing off the code, we employ the Iterator class, which is the preferred approach to inspecting each element in a Java Collection Structure. This holds true since iterator() is a common method available for every Collection Hierarchy Class, which in turn produces an Iterator class. Once a particular instance of Iterator is created, the hasNext() method can be used in conjunction with a for cycle to produce an iteration over each element, with the next() method being employed to access the particular contents of the element in question.

Up to this point we have covered the basic mechanisms for grouping single elements with the Java Collections Framework, next we will illustrate how to group elements whilst assigning them to an associative key for later reference.

Map Hierarchy: Grouping key-values

The Map hierarchy covers the other half of the Java Collection Framework and is employed for aggregating key-value lists in a Java Application. The following table describes the interface and class hierarchy that comprise the other spectrum of the Java Collection Framework.

Map (HashMap / IdentityHashMap / LinkedHashMap )
SortedMap (TreeMap)
  • Map: The parent interface for grouping key-value lists in the Java Collections Framework, it is designed not to allow any duplicate keys in its structure, although duplicate values are permitted. It has three implementation classes associated with it:

    • HashMap: A vanilla implementation of the Map interface, similar in functionality to the classic HashTable class.
    • LinkedHashMap: Differs from the HashMap class in that it guarantees the initial insertion-order on later operations performed on the Class. This behaviour is achieved through a doubly-linked list maintained on each element, thus the name of the class.
    • IdentityHashMap: This particular class allows you to use reference-equality with all keys and values placed in its structure. In this class two keys k1 and k2 are considered equal if and only if k1==k2, while in the HashMap class two keys k1 and k2 are considered equal if and only if k1.equals(k2).
  • SortedMap: As a sub-interface to Map, SortedMap does not allow any duplicate key in its content, however it is a special kind of Map in that it naturally orders its keys; it other words, it guarantees that every operation on it will return the elements in ascending order based upon the key value. The TreeMap class is an implementation of this interface.

The Map Hierarchy illustrated

Taking the example described in the Collection Hierarchy section, we will now convert these same subjects into Map keys and associate a value to each one:

// Creating a TreeMap instance
// and placing it in the Map Interface
Map subjectsBySite = new TreeMap();

// Placing values inside the TreeMap instance
subjectsBySite.put("Linux News","newsforge.com");
subjectsBySite.put("PHP","devchannel.org");
subjectsBySite.put("Java","devchannel.org");
subjectsBySite.put("XML","devchannel.org");
subjectsBySite.put("Perl","linux.com");
subjectsBySite.put("Linux Development","linux.com");
subjectsBySite.put("PHP","linux.com");
subjectsBySite.put("XML","devchannel.org");
subjectsBySite.put("Web-Services","devchannel.org");
subjectsBySite.put("Linux Tools","linux.com");
subjectsBySite.put("IT Management","itmanagersjournal.com");
subjectsBySite.put("Security","devchannel.org");
subjectsBySite.put("Java","devchannel.org");
subjectsBySite.put("J2EE","devchannel.org");
subjectsBySite.put("XML","devchannel.org");

// Counting key-value elements on the Map which would be 11
System.out.println(subjectsBySite.size());

We start by creating a TreeMap instance which is assigned to the root interface Map. Afterwards the put method is used to populate the Map with 15 key-values. Since a Map does not allow duplicate key names, any duplicate keys will be dropped, which results in an 11 element TreeMap. Another effect a TreeMap has over its elements is that every element will be sorted in ascending order based on its key value.

In this next section we create a Set which will contain each key-value present in the aforementioned Map. This structure will ease iterating over every value present in the TreeMap:

// Defining a Map.Entry Set over the Map
// Necessary for iterating in one cycle
Set subjectAndSite = subjectsBySite.entrySet();

// Defining a place-holder for Linux related Sites
// Using Set to avoid duplicates
Set linuxSites = new HashSet();

// Defining an Iterator
for (Iterator it = subjectAndSite.iterator(); it.hasNext(); ) {
    Map.Entry sAndSEntry = (Map.Entry) it.next();
    String theme = (String) sAndSEntry.getKey();

// Selecting any subjects that contains Linux
    if (theme.startsWith("Linux")) {
	// Placing the Linux related site in the Linux Set
	linuxSites.add(sAndSEntry.getValue());
   }
}

As you will notice, each iteration extracts a key-value which is placed into a Map.Entry reference. Each key value is placed in a String, which is later verified for a pattern starting with "Linux." If the key matches this pattern then the corresponding value of the key-value is placed in the Set named linuxSites which was defined prior to the Iteration.

A final section illustrates other useful mechanisms that can be used to manipulate Collections Framework elements:

// Extracting values (sites) from original Map
Collection allSites = subjectsBySite.values();

// Filtering values (sites)
Set nonLinuxSites = new HashSet(allSites);
nonLinuxSites.removeAll(linuxSites);

System.out.println("If Linux is your cup of tea then you 
                          will like : " + linuxSites);
System.out.println("If your weary of the penguin, then you 
                          can visit : " + nonLinuxSites);

The first line creates a Collection from every value present in the original Map by using the values() method, which automatically creates this structure. Afterwards this Collection is converted to a Set in order to eliminate duplicate values. Finally, with the removeAll() method you can eliminate every value present in nonLinuxSites that is included in the linuxSites Set.

Although this article covers all the Interfaces and Classes which compose the Java Collections Framework, the code shown outlines only a fraction of what you can achieve through the Framework. These structures offer tremendous depth when it comes to manipulating grouped objects in Java. They can be mastered only through practice. I encourage you to explore the Java Collections Framework and all of its functionalities present in the java.util package of any J2SE/JDK, which will surely lead you to be a better and more efficient Java coder.


Originally published December 2003 [ Publisher Link ]

Back to Article List