Download What is a Binary Tree?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Java ConcurrentMap wikipedia , lookup

B-tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
1)
Week 6- Initial Response
DQ2: We have discussed several complex data structure (e. g. linked lists, stacks, queues, and
trees), choose one of these complex data structures, and illustrate how you can use it to solve a
practical problem (at least 300 words). (This is not a coding task; however, you are welcome to
provide a coding solution)
Solution
Data structure is as collection of information stored and organized in a particular way for better
algorithm efficiency like queue, stack, linked list and tree. Here I would like to talk about tree as
one type of data structure.
Tree is a widely-used data structure and node based data structures, a tree is a collection of
entries that have a hierarchical organization like an organization chart of any company, CEO is
represented at the top, with lines branching down to the vice presidents who are followed by
middle managers and so on, the element at the top of the tree is called the root. The elements
that are directly under an element are called its children. The element directly above something
is called its parent. Finally, elements with no children are called leaves.
Nodes:
A node may contain a value or a condition or represents a separate data structure or a
tree of its own. Each node in a tree has zero or more child nodes, which are below it in
the tree.
Root nodes:
The topmost node in a tree is called the root node. Being the topmost node, the root
node will not have parents. It is the node at which operations on the tree commonly
begin, although some algorithms begin with the leaf nodes and work up ending at the
root. All other nodes can be reached from it by following edges or links
Leaf nodes:
Nodes at the bottommost level of the tree are called leaf nodes. Since they are at the
bottommost level, they do not have any children.
Internal nodes:
An internal node or inner node is any node of a tree that has child nodes and is thus not
a leaf node.
Good example for using the tree in binary search, in this case the tree called binary search tree
which can be useful to sort numbers/object that make it easy to search.
References
Wikipedia, (2010) Data structure [Online] Available from:
http://en.wikipedia.org/wiki/Data_structure (Accessed: 11 July 2010).
Wikipedia, (2010 Tree (data structure) [Online] Available from:
http://en.wikipedia.org/wiki/Tree_(data_structure) (Accessed: 11 July 2010).
Question:
1. Can you provide more elaborative example of a practical problem?
2. What is the difference between leaf and internal node?
3. What is the difference between tree and graph?
2)
Work queues and threads
A queue is a data structure that maintains an ordered group of items, where the items are (usually)
inserted at one end of the queue and then removed in the same order from the other end of the
queue. This is known as a first-in, first-out (FIFO) queue and is ideal for job (or request) processing.
FIFO queue implementations are found virtually everywhere in computer hardware and software and
naturally also play an important role in server applications that need to process a lot of requests or
other (small) jobs concurrently.
A common design pattern in a high transaction environment is to create a work queue that would
store (queue) the transactions (or jobs) that need to be processed. Some of the queued jobs are then
processed simultaneously (in order) with a fixed number of threads (called thread pools).
While Java provides everything we need to accomplish this task, it is good to understand how it
works by looking at a simple example.
The example implementation below has been adapted from an example provided by IBM. (There are
many more advanced implementations that may also allocate a priority on queue items, etc.)
Example implementation of a work queue
We are going to add 5 jobs to the work queue, and each job will print the numbers 1 to 5 along with
its thread name.
The jobs are implemented as a Thread (or with the Runnable interface) and only needs to provide a
run() method:
/* MyCounterJob.java */
public class MyCounterJob extends Thread {
@Override
public void run() {
try {
for (int i = 0; i < 5; i++) {
System.out.println(this.getName() + " count: " + (i
+ 1));
// we will now sleep to allow other jobs to run
// Please note that this is just an example and
needs to be
// revised for your own use.
Thread.sleep(5);
}
} catch (Exception e) {
}
}
}
The work queue is implemented as a service. New jobs are added to the queue by calling the
execute() method with an instance of the job (Thread\Runnable).
The WorkQueueService class maintains a queue of jobs, which are processed by the specified of
number of pool worker threads. Each pool worker thread retrieves the next job from queue, and
then calls the run() method as defined above.
The service is automatically started when a new instance of WorkQueueService class is created and
will remain active until the stopService() method is called. If the service has previously been stopped,
it will automatically resume processing if new jobs are added to queue.
/* WorkQueueService.java */
import java.util.LinkedList;
public class WorkQueueService {
private final int nThreads;
private final PoolWorker[] threads;
private final LinkedList queue;
private boolean stop = false;
public WorkQueueService(int nThreads) {
this.nThreads = nThreads;
queue = new LinkedList();
threads = new PoolWorker[nThreads];
startService();
}
private void startService() {
for (int i = 0; i < nThreads; i++) {
threads[i] = new PoolWorker();
threads[i].start();
}
}
public void stopService() {
stop = true;
}
public void execute(Runnable r) {
if (stop == true) {
// start the service again
stop = false;
startService();
}
synchronized (queue) {
queue.addLast(r);
queue.notify();
}
}
public boolean allJobsComplete() {
synchronized (queue) {
return queue.size() == 0;
}
}
private class PoolWorker extends Thread {
public void run() {
Runnable r;
while (true) {
synchronized (queue) {
while (queue.isEmpty()) {
try {
queue.wait();
} catch (InterruptedException ignored) {
}
}
r = (Runnable) queue.removeFirst();
}
try {
r.run();
} catch (RuntimeException e) {
e.printStackTrace();
}
if (stop == true && allJobsComplete())
break;
}
}
}
public static void main(String[] args) {
// start the service with 2 threads in the pool
WorkQueueService workQueueService = new WorkQueueService(2);
// create 5 jobs
for (int i = 0; i < 5; i++) {
MyCounterJob job = new MyCounterJob();
job.setName("Job" + (i + 1));
workQueueService.execute(job);
}
// stop the WorkQueueService service once all jobs have been
processed
workQueueService.stopService();
}
}
Output from the program:
Job1
Job2
Job1
Job2
Job1
Job2
Job1
Job2
Job1
Job2
Job3
Job4
Job3
Job4
Job3
Job4
Job3
Job4
Job3
Job4
Job5
Job5
Job5
Job5
Job5
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
count:
1
1
2
2
3
3
4
4
5
5
1
1
2
2
3
3
4
4
5
5
1
2
3
4
5
I created the work queue with 2 processing threads (thread pool workers) and from the output we
can see that the jobs are processed concurrently, two at a time.
References
Deitel, P., Deitel, H. (2010) Java How to Program 8th edition. New Jersey: Pearson Prentice Hall
IBM (2002) Java theory and practice: Thread pools and work queues [Online]. Available from:
http://www.ibm.com/developerworks/library/j-jtp0730.html (Accessed: 11 July, 2010)
Questions:
1. What is dequeue?
2. What is the difference between queue and dequeue?
3)
DQ2: We have discussed several complex data structure (e. g. linked lists, stacks, queues, and trees),
choose one of these complex data structures, and illustrate how you can use it to solve a practical
problem (at least 300 words). (This is not a coding task; however, you are welcome to provide a
coding solution)
W6DQ2
In this document will be discussed Binary Trees as a logical data structure without any reference to
any particular programming language.
What is a Tree?
"A Tree is a collection whose entries have a hierarchical organization similar to that of an
organization chart of a typical company" (Brookshear 2009)
Each position in a tree is called node.
In this scenario the president is at the top (which in the Tree jargon is called root node).
The president will have 2 or more vice presidents (children nodes).
Each vice president will have one or more localized section managers and so on.
Finally there will be the employee who is not managing anyone (leaf node).
All the nodes which have the same parent are called siblings. In the figure above for example all vice
presidents are siblings. In the same fashion the Manager of subsection A1 and the one of subsection
A2 are sibling as well (and so on).
The path from the root node to a node is called depth of the node (root node has depth zero). All the
nodes with the same depth form a level of the tree.
The longest path from the root node to a leaf is usually referred as height of the tree.
If we take any node of the tree and we can notice together with the nodes below it another tree
structure is formed. This smaller (but with the same characteristics) structure is called subtree.
An important property of the tree is that lower subtree cannot join together later on. Which, to make
a comparison with the example, means that each employee cannot have two superiors.
What is a Binary Tree?
A binary tree is a particular form of tree in which each parent can have at maximum two children.
Usually this structure is created respecting some comparison properties. Meaning that in a binary
tree with no duplicates, the data in each node of the left subtree is lower than the data in the parent
node. And the data in each node of the right subtree is greater than the value of the data in the
parent node.
As "data" is meant any comparable entity (be it an Integer, Double, String or any other Object) which
can be compared with its own kind.
Let's represent for example a binary tree with the nodes A, B, C, D, E, F, G, H, where G is the root
node.
G
/ \
D
H
/ \
B
E
/ \
\
A
C
F
There are three main modes to recursively traverse a tree: preorder, inorder and postorder (Deitel,
2010).
The preorder traversal means that first is processed the value of the node, then the values in the left
subtree and finally the values in the right subtree.
The inorder mode means that first it's processed the left subtree, then the value of the current node
and then the value of the right subtree.
The postorder instead first traverse the left and the right node and finally the value of the node.
In our example the various mode would traverse the tree in the following way:



Preorder: G, D, B, A, C, E, F, H.
Inorder: A, B, C, D, E, F, G, H.
Postorder: A, C, B, E, F, D, H, G.
The inorder mode if traversal is also called binary search tree since it traverses the tree
in ascending order.
The use of this traversal modes are also used in different mathematical notations (Larma
2004):



Prefix notation (also known as Polish Notation) uses the preorder mode.
Infix notation is the classical notation.
Postfix notation (also known as Reverse Polish Notation) uses the postorder mode.
Let's take for example the (infix) expression a * (b +c), in a tree structure can be represented as
follow:
*
/ \
a
+
/ \
b
c
This of course, implies that each operator must have two operands and that each subtree must be
reduced before being used as an operand.
This, in Polish notation, becomes *a+bc. And in reverse Polish notation (RPN) this becomes abc+*
The visible advantage of Polish Notations is that there is no need for parenthesis (which is the main
reason why it was invented). Then the reverse notation came later to be more 'computer-friendly' in
terms of resource used.
Binary Tree Characteristics
Binary Trees are called balanced (or full or tightly packed) if "each level contains
about as twice as many elements as the previous one" (Deitel 2010).
This makes them ideal for fast searching elements in the tree. For example if a full binary
tree contains n elements then the Big O notation for the binary search tree algorithm is
.
This means that to search a 1000-element balanced tree takes at most 10 comparisons
to find the element.
Let's take again our previous example of letters from A to H. As we can see that tree is
unbalanced (the left subtree has many more elements than the right one).
To create a balanced tree from an ordered list of elements we must start creating the
root node picking the element in the middle, then the left child would be the element in
the middle of the left sequence. The right child, as a consequence, would be the middle
element of the right half of the sequence and so on.
Let's clarify it with an example:
A, B, C, D, E, F, G, H
D
/ \
/
\
B
G
/ \
/ \
A
C E
H
\
F
Here we don't have a tightly packed tree because of the number of its element, but let's
try to analyze the worst case scenario, where we want to search for the letter 'F'.
1.
2.
3.
4.
F
F
F
F
is
is
is
is
greater than D so we go right
lower than G so we go left
greater than E so we go right
the value of the node
we made 4 comparisons. If we had the elements stored in a linear structure (like an array) in the
worst case scenario we would have made 8 comparison (if the element searched is in the last
position).
Conclusions
In this short introduction to binary trees we just scratched the surface of their nature and how they
can be used.
There are properties not taken into account to make the document concise, nonetheless it's already
possible to observe their potential (especially when handling thousands of elements).
References:
Deitel, P Deitel H (2010), 'Chapter 22: Custom Generic Data Structures', in Java: How to Program (8th
Edition), Prentice Hall, pp. 918, 942.
Brookshear, JG (2009), 'Chapter 8: Data Abstraction', in Computer Science: An Overview (10th
Edition), Addison-Wesley Educational Publishers Inc, pp. 393, 407-410, 413.
Lerma, MA (2004), Notes on Discrete Mathematics [Online]. Available from
http://www.math.northwestern.edu/~mlerma/courses/cs310-04w/notes/dm-all.pdf (Accessed: 11
July 2010).
4)
For this discussion question I would like to describe the Map Data Structure, even if not included in
the Seminar, for 2 main reasons:
1. The map data structure and its implementations in the Core API are massively adopted in
software solutions
2. I received a number of questions in the previous weeks every time I used maps, hence I
thought it could be useful to provide an overview.
Map Data Structure
A Map is data structure where each value is associated to a key. Each item in a map will be then a
couple <Key, Value> where the key will be unique. The most famous specialization of this structure is
the Hash Map (also known as Hash Table). The Hash Map allows to organize a set of items in order
to optimize the search of a specific value. Each item is associated to an unique key, usually obtained
applying a injective function to the value.
According with Wikipedia (injective function, n.d.) “The function f is injective if for all a and b in A, if
f(a) = f(b), then a = b; that is, f(a) = f(b) implies a = b. Equivalently, if a ≠ b, then f(a) ≠ f(b).”
For example we can consider the function f which associates to each positive integer its square:
For each a in A, where a is integer and a>=0, f(a)=a2.
The map will be then populated in this way:
KEY
VALUE
9
3
144
12
49
7
….
….
121
11
Where on the first row the value to store is 3 and the key is the square 3 that is 9, on the second row
the value to store is 12 and the key is the square of 12 that is 144 and so on.
A structure like this is able to answer in a very quick way to the question:
“Does the structure contains the value X?”
To do so, we will need to calculate the square of X and to check the result is a key in the Map.
The power of the Hash Map is even more evident if we consider a more complex scenario. We could
for example use an Hash Map to create a Dictionary, where the key is calculated from the word to
explain, and the value is the definition of that word. Of course we cannot use the word itself as key,
since this could result in doubled keys for example in case of synonyms, that is not allowed. We need
to apply an injective function which will calculate a unique value associated to a specific word. The
task to identify a specific key to associate to a specific value is not easy, as the key must be easy to
calculate and it also must avoid collisions, that is what happen when two distinct values are
associated to the same key.
Map implementation in Java Core API
Map is one of the Interface included in the Java Framework Collections, in particular it is the only
interface not extending the Collection Interface. Map provides a collection views across its elements,
allowing for example to iterate the keys set, the collection of values and even the entrySet, which is
the set of <Key,Value> couples defined in the map (The Map Interface, n.d., Java tutorial).
HashMap is one of the implementation of the Map interface, and it realizes the algorithm that has
been previously described.
As solution for creating a unique key to insert object in a Map, Java defines in the Class object the
method:
public int hashCode();
That can be overwritten in order to define a Class specific behaviour.
Let's for example consider a problem similar to the last week discussion question, requiring to create
an application able to manage a set of contacts, each containing name, surname, telephone number
ext.
We can define the Class Contact having as instance variable:
String firstName;
String secondName;
String telephoneNumber;
An HashMap object could allow to organize the Contact objects inputted by the User.
We can then overwrite the hashCode() method in order to generate a numeric key according with
the instance variable values
public class Contact {
String firstName;
String secondName;
String telephoneNumber;
. . .
. . .
. . .
@Override //this is an override of the method defined in java.lang.Object
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + firstName.hashCode();
result = prime * result + secondName.hashCode();
result = prime * result + telephoneNumber. hashCode();
return result;
}
}
As 31 is a prime number the hashCode() method will generate a unique value for each object having
different instance variable.
Let assume contactsMap is the object instance of java.util.HashMap containing the set of instances of
Contacts class inputted by the user.
If we want to know if the map contains a object having:
firstName: “John”
secondName: “Kersil”
telephone: “0123232323”
We can:

create an instance of the class Contacts with those fields, for example tempContact and
invoke contactsMap.containsKey( tempContact ) that will return true if the map contains that
key and false otherwise
 create a method that will use the same algorithms definied in the Contact class, for example
something like:
public int calculateHashCodeForContact(string fistName, string secondName, string telephone) {
//will apply the same algorithm in the Contact class hashcode()
}
The problem presented would be even better solved if the hashCode() method in the Contact Class
would be defined only considering firstName and secondName. This because we could assume that
every contact with same name and surname should be associated to a specific person, considering
the telephone number (that can change) irrelevant to uniquely identify a contact.
In conclusion, having talked about hashCode method, I would like to mention the contract between
hashCode and equals methods. These methods are both defined in java.lang.Object Class, and their
purpose is:


for hashCode() to define a unique number associated to the object, and calculated according
all of some of the instance variable
for equals(Object o) to check whether the input provided is logically equivalent to the
current instance.
Following a possible override of the equals in the class Contact
public class Contact {
String firstName;
String secondName;
String telephoneNumber;
. . .
. . .
. . .
@Override //this is an override of the method defined in java.lang.Object
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + firstName. hashCode();
result = prime * result + secondName.hashCode();
result = prime * result + telephoneNumber. hashCode();
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
Contact other = (Contact) obj;
if (!firstName.equals(obj.getFirstName()))
return false;
if (!secondName.equals(obj.getSecondName()))
return false;
if (!telephoneNumber.equals(obj.getTelephoneNumber()))
return false;
return true;
}
}
Java defines the following contract between equals and hashCode methods:
5. if two objects are equal according the equals(...) method then they have to generate the
same value invoking hashCode() (java.lang.Object, n.d.)
§§§§§§ REFERENCE LINK §§§§§§
Injective Functions (2010) Wikipedia [Online]. Available from:
http://en.wikipedia.org/wiki/Injective_function
(Accessed on 11th July 2010)
Map Interface (n.d.) The Java Tutorials [Online]. Available from:
http://download.oracle.com/docs/cd/E17409_01/javase/tutorial/collections/interfaces/map.html
(Accessed on 11th July 2010)
java.lang.Object equals method (n.d.) Java Standard Platform 6.0 [Online]. Available from:
http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/lang/Object.html#equals%
28java.lang.Object%29
(Accessed on 11th July 2010)
Questions:
What are the type of trees?
Which type of tree is provides faster searching?
What is the difference between tree and graph.