We have seen previous that large amounts of data often require searches to find certain pieces. In addition, that data often has to also be sorted. We can sort by any number of keys: relevance, alphabetical, value, etc. The process of sorting can also lead to more efficient searching.
There are a variety of sorting algorithms available for use. Different algorithms excel is different situations. As previously discussed, Big O notation, as well as best/worst/average case scenarios are used to determine the best algorithm to use for a given situation.
Bubble Sort
Bubble sort is one of the simplest algorithms for sorting a set of data. The bubble sort makes multiple passes through a list. It compares adjacent items and exchanges those that are out of order. Each pass through the list places the next largest value in its proper place. In essence, each item “bubbles” up to the location where it belongs. Bubble Sort does not have a best/worst case as it always runs the same making n-1 passes on n element. Thus the Big O efficiency for Bubble Sort is O(n2).
Bubble sort is, most often, implemented as an iterative algorithm.
The basic code for bubble sort is as follows:
int[]arr={4,5,3,8,1};
for(int x=0; x<arr.length; x++){ //cycle through array n times, once for each element
for(int y=0; y<arr.length-1; y++){ //cycle though array n-1 times, preparing to compare to next element
if(arr[y]>arr[y+1]){ //if the next element is bigger, swap them
int temp=arr[y];
arr[y]=arr[y+1];
arr[y+1]=temp;
}
}
}
Insertion Sort
Insertion Sort is a simple algorithm for sorting a set of data. In Insertion Sort, we declare part of the array as sorted, starting with only the first element. Then, as we move through the list, we take the next element in order, and place it appropriately in the "sorted" portion of the array. For example, when we reach the middle element in a list, the first half of the list will be in sorted order and the second half of the list will not. The best case scenario is that the list is in order, leaving us with a O(n) or linear efficiency. The worst cast is the list in reverse order in which the most comparisons have to be made. Both the worst and average cases have an efficiency of O(n2).
Insertion sort is, most often, implemented as an iterative algorithm.
The basic code for insertion sort is as follows:
int [] arr={4,6,3,7,8};
for(int x=1; x<arr.length; x++){ //cycle through each element of the array
int current=arr[x]; //get the current unsorted value
int y=x; //set a counter for the sorted part of the array
while(y>0 && arr[y-1]>current){ //continue while the item to the left is bigger than the current item
arr[y] = arr[y-1]; //shift the item
y--; //decrement the counter
}
arr[y]=current; //put the current value in its sorted place
}
Selection Sort
Selection Sort is a simple algorithm for sorting a set of data. In selection sort, we search the entire array for the smallest element, and then place it in the first position. We then do the same for the 2nd smallest, and so on. The best case/worst case and average case for selection sort are all the same. In each case, the loops always run the same number of times, so we find the Big O time is (n-1)*n/2 ==> O(n2). This is known as quadratic.
The basic code for a selection sort is as follows:
int [] arr={4,6,3,7,8};
for(int x=0; x<arr.length-1; x++){ //cycle through each element of the array
int minPos=x; //set the location of the smallest value
for(int y=x+1; y<arr.length; y++){ //check each element after the current element
if(arr[y]<arr[min]){ //check to see if the current value is smaller than min
minPos=y; //set minPos to new value
}
swap(arr, i, min); //call a swap method to switch the current index with the min value
}
}
The swap function is a standard function used to switch two values in two variables. If we want to swap the values in a and b, we store a in a temp variable. Then, we store b in a. And finally, we put the temp value into b.
temp=a;
a=b;
b=temp;
Searching
Information storage is big business. Along with storage, comes the need to access information in a variety of ways. One of the more common access methods is that of searching. We search all the time on Google, accessing their extensive database of web pages and content. On a smaller scaler, we have been writing programs that store significant amounts of information in arrays. Often, we need to search through an array to locate the index of a given item (a student id, a specific score, etc.). In the case of parallel arrays of related information, finding the index of one item provides us with lots of other information about that thing.
There are a variety of search algorithms available for use. Different algorithms excel in different situations. This type of decision making is often discussed in a data structures course and involves informal analysis tools such as best/worst case scenarios as well as formal analysis tools such as Big O Notation. Big O is a method of characterizing the nature of an algorithm in terms of the number of comparisons that need to be run. More on this later.
Sequential Search
Sequential Search is a very simple algorithm for searching an array. In sequential search, each element of the array, in order, is checked against the target. The best case sencario is that the target is in the first slot of the array. The worst case scenario is that the target is either in the last slot OR not in the array. Most searching algorithms return the index of the target in the structure or -1 if the target is not found. In terms of Big O, sequential search is looked aat as a linear timed algorithm or O(n). This means that as the size of the data structure grows, the average number of comparisons grows at the same rate.
The basic code for a sequential search is as follows:
int [] arr={4,6,3,7,8};
int target=7;
int foundIndex=-1;
for(int x=0; x<arr.length; x++){
if(target==arr[x]){
foundIndex=x;
break;
}
}
In this case, the search terminates when the item is found, thus returning the first occurence of the item. By removing the break statement, or by traversing the array in reverse order, we could easily return the last occurence of the item.