Data Structures - Introduction

The data structure is a logical or mathematical organization of data; it describes how to store the data and access data from memory. Actually in our programming data stored in main memory(RAM) and To develop efficient software or firmware we need to care about memory. To efficiently manage we required data structure.

There are two different types of data structure:

  1. Linear Data Structure: In linear data structure data elements stored in a sequential manner. Stack, Queue, and Linked List are the types of linear data structure.
  2. Non-Linear Data Structure: In Non-Linear data structure data elements are not stored in the sequence manner. Tree and Graph are the types of non-linear data structures.

An algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output. Algorithms are generally created independent of underlying languages, i.e. an algorithm can be implemented in more than one programming language. From the data structure point of view, the following are some important categories of algorithms −

  • Search − Algorithm to search an item in a data structure.
  • Sort − Algorithm to sort items in a certain order.
  • Insert − Algorithm to insert an item in a data structure.
  • Update − Algorithm to update an existing item in a data structure.
  • Delete − Algorithm to delete an existing item from a data structure.

Characteristics of an Algorithm

Not all procedures can be called an algorithm. An algorithm should have the following characteristics −

  • Unambiguous − The algorithm should be clear and unambiguous. Each of its steps (or phases), and their inputs/outputs should be clear and must lead to only one meaning.
  • Input − An algorithm should have 0 or more well-defined inputs.
  • Output − An algorithm should have 1 or more well-defined outputs and should match the desired output.
  • Finiteness − Algorithms must terminate after a finite number of steps.
  • Feasibility − Should be feasible with the available resources.
  • Independent − An algorithm should have step-by-step directions, which should be independent of any programming code.

Algorithm Analysis

The efficiency of an algorithm can be analyzed at two different stages, before implementation, and after implementation. They are the following −

  • A Priori Analysis − This is a theoretical analysis of an algorithm. The efficiency of an algorithm is measured by assuming that all other factors, for example, processor speed, are constant and have no effect on the implementation.
  • A Posterior Analysis − This is an empirical analysis of an algorithm. The selected algorithm is implemented using a programming language. This is then executed on the target computer machine. In this analysis, actual statistics like running time and space required, are collected.

We shall learn about a priori algorithm analysis. Algorithm analysis deals with the execution or running time of various operations involved. The running time of an operation can be defined as the number of computer instructions executed per operation.

Algorithm Complexity

Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X are the two main factors, which decide the efficiency of X.

  • Time Factor − Time is measured by counting the number of key operations such as comparisons in the sorting algorithm.
  • Space Factor − Space is measured by counting the maximum memory space required by the algorithm.

The complexity of an algorithm f(n) gives the running time and/or the storage space required by the algorithm in terms of n as the size of input data.

Asymptotic Analysis

Asymptotic analysis of an algorithm refers to defining the mathematical foundation/framing of its run-time performance. Using asymptotic analysis, we can very well conclude the best case, average case, and worst-case scenario of an algorithm. Asymptotic analysis is input bound i.e. if there’s no input to the algorithm, it is concluded to work in a constant time. Other than the “input” all other factors are considered constant. Asymptotic analysis refers to computing the running time of any operation in mathematical units of computation. For example, the running time of one operation is computed as f(n), and maybe for another operation, it is computed as g(n2). This means the first operation running time will increase linearly with the increase in n and the running time of the second operation will increase exponentially when n increases. Similarly, the running time of both operations will be nearly the same if n is significantly small.

Usually, the time required by an algorithm falls under three types −

  • Best Case − Minimum time required for program execution.
  • Average Case − Average time required for program execution.
  • Worst Case − Maximum time required for program execution.

Asymptotic Notations

Following are the commonly used asymptotic notations to calculate the running time complexity of an algorithm.

  • Ο Notation
  • Ω Notation
  • θ Notation

Big Oh Notation, Ο

The notation Ο(n) is the formal way to express the upper bound of an algorithm’s running time. It measures the worst-case time complexity or the longest amount of time an algorithm can possibly take to complete.

For example, for a function f(n)

Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n > n0. }

Omega Notation, Ω

The notation Ω(n) is the formal way to express the lower bound of an algorithm’s running time. It measures the best case time complexity or the best amount of time an algorithm can possibly take to complete.

For example, for a function f(n)

Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }

Theta Notation, θ

The notation θ(n) is the formal way to express both the lower bound and the upper bound of an algorithm’s running time. It is represented as follows −

θ(f(n)) = { g(n) if and only if g(n) =  Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }

Common Asymptotic Notations

Following is a list of some common asymptotic notations −

constant

Ο(1)

logarithmic

Ο(log n)

linear

Ο(n)

n log n

Ο(n log n)

quadratic

Ο(n2)

cubic

Ο(n3)

polynomial

nΟ(1)

exponential

2Ο(n)

Data structure mainly specifies the following things:

  1. Organization of Data
  2. Accessing methods
  3. Degree of associativity
  4. Processing alternatives for information

Data structures are the building blocks of a program. The selection of a particular structure focuses on the following two things:

  1. The data structure must be rich enough in structure to reflect the relationship existing between the data.
  2. The structure should be simple to process data effectively whenever required.

Data Structure can be classified into two broad categories:

  • Primitive data structure
  • Non-primitive data structure

1) Primitive Data Structure

Primitive data structures are basic structures and are directly operated upon by machine instructions. The primitive data structure has different representations on different computers. The primitive data structure is divided into four categories:

  • Integer
  • Floating point numbers
  • Character constants
  • Pointers

2) Non-primitive data structure

Non-primitive data structures are more sophisticated data structures. Non-primitive data structures are derived from the primitive data structures. The non-primitive data structures emphasize structuring a group of homogeneous (same type) or heterogeneous (different type)data items.

The non-primitive data structure is categorized into the following:

  • Array
  • Linked list
  • Queue
  • Tree
  • Graph
  • Stack

Array

An array is the data type of non-primitive type. It is defined as a set of numbers of the same type of elements or we can say a set of homogeneous elements or data items. It means an array can contain one type of data only, either all floating-point numbers or all characters. Declaration of the array is a follow:

int A[10];

Where int specifies the data type of elements array stores. a is the name of the array, and the number specified inside the square brackets (subscript) is the number of elements an array can store, this is also called the size and length of the array.

Linked list

A linked list can be defined as a collection of a variable number of data items. Lists are the most commonly used non-primitive data structures. An element of the linked list consists of two parts. One part is used to contain the value or parameter. While the other part is used to store the address of the next element of the linked list.

Queue

Queues are the first out the type of data structure. In a queue, new elements are added to the queue from one end called REAR. An element removed from the other end called FRONT.

Tree

A tree can be defined as a finite set of data items called nodes. The tree is a nonlinear type of data structure in which data items are arranged in a sorted sequence. Trees represent the hierarchical relationship between various elements. The tree always grows in length towards the bottom of the data structure.

Graph

A graph G (V, E) is a set of vertices V and a set of edges E. An edge connects a pair of vertices. Vertices of the graph are shown as points or circles and edges are drawn as arcs or line segments.

There are two types of graph:

  • Undirected graph
  • Directed graph