Example: Select and Project operations. This is used in operations where the operands are distributed at different sites. This is also appropriate in systems where the communication costs are low, and local processors are much slower than the client server. Here, data fragments are transferred to the high-speed processors, where the operation runs.
The results are then sent to the client site. The buyer formulates a number of alternatives for choosing sellers and for reconstructing the global results.
The target of the buyer is to achieve the optimal cost. The algorithm starts with the buyer assigning sub-queries to the seller sites.
The optimal plan is created from local optimized query plans proposed by the sellers combined with the communication cost for reconstructing the final result. Once the global optimal plan is formulated, the query is executed. Optimal solution generally involves reduction of solution space so that the cost of query and data transfer is reduced. This can be achieved through a set of heuristic rules, just as heuristics in centralized systems. Perform selection and projection operations as early as possible. This reduces the data flow over communication network.
Simplify operations on horizontal fragments by eliminating selection conditions which are not relevant to a particular site. In case of join and union operations comprising of fragments located in multiple sites, transfer fragmented data to the site where most of the data is present and perform operation there. Use semi-join operation to qualify tuples that are to be joined. This reduces the amount of data transfer which in turn reduces communication cost. This chapter discusses the various aspects of transaction processing.
In the last portion, we will look over schedules and serializability of schedules. A transaction is a program including a collection of database operations, executed as a logical unit of data processing. The operations performed in a transaction include one or more of database operations like insert, delete, update or retrieve data.
It is an atomic process that is either performed into completion entirely or is not performed at all. A transaction involving only data retrieval without any data update is called read-only transaction. Each high level operation can be divided into a number of low level tasks or operations. Likewise, for all transactions, read and write forms the basic database operations.
Swipe to navigate through the articles of this issue
A committed transaction cannot be rolled back. A transaction may go through a subset of five states, active, partially committed, committed, failed and aborted. The transaction remains in this state while it is executing read, write or other operations. The following state transition diagram depicts the states in the transaction and the low level transaction operations that causes change in states. Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation, and Durability.
What is distributed database? - Definition from psychalivesmy.tk
No partial update should exist. It should not adversely affect any data item in the database. There should not be any interference from the other concurrent transactions that are simultaneously running. In a system with a number of simultaneous transactions, a schedule is the total order of execution of operations. Given a schedule S comprising of n transactions, say T1, T2, T3………..
Tn; for any transaction Ti, the operations in Ti must execute as laid down in the schedule S. In a schedule comprising of multiple transactions, a conflict occurs when two active transactions perform non-compatible operations. A serializable schedule contains the correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Concurrency controlling techniques ensure that multiple transactions are executed simultaneously while maintaining the ACID properties of the transactions and serializability in the schedules. Locking-based concurrency control protocols use the concept of locking data items.
Generally, a lock compatibility matrix is used which states whether a data item can be locked by two transactions at the same time. Locking-based concurrency control systems can use either one-phase or two-phase locking protocols. In this method, each transaction locks an item before use and releases the lock as soon as it has finished using it.
This locking method provides for maximum concurrency but does not always enforce serializability. In this method, all locking operations precede the first lock-release or unlock operation. The transaction comprise of two phases.
In the first phase, a transaction only acquires all the locks it needs and do not release any lock. This is called the expanding or the growing phase. In the second phase, the transaction releases the locks and cannot request any new locks. This is called the shrinking phase. Every transaction that follows two-phase locking protocol is guaranteed to be serializable. However, this approach provides low parallelism between two conflicting transactions.
These algorithms ensure that transactions commit in the order dictated by their timestamps. An older transaction should commit before a younger transaction, since the older transaction enters the system before the younger one. Timestamp-based concurrency control techniques generate serializable schedules such that the equivalent serial schedule is arranged in order of the age of the participating transactions. This causes the younger transaction to wait for the older transaction to commit first.
This rule prevents the older transaction from committing after the younger transaction has already committed. In systems with low conflict rates, the task of validating every transaction for serializability may lower performance. In these cases, the test for serializability is postponed to just before commit. Since the conflict rate is low, the probability of aborting transactions which are not serializable is also low. This approach is called optimistic concurrency control technique.
T j can commit only after T i has finished execution. T j can start executing only after T i has already committed. T j can start to commit only after T i has already committed. In this section, we will see how the above techniques are implemented in a distributed database system. The basic principle of distributed two-phase locking is same as the basic two-phase locking protocol. However, in a distributed system there are sites designated as lock managers. A lock manager controls lock acquisition requests from transaction monitors.
In order to enforce co-ordination between the lock managers in various sites, at least one site is given the authority to see all transactions and detect lock conflicts. All the sites in the environment know the location of the central lock manager and obtain lock from it during transactions. Each of these sites has the responsibility of managing a defined set of locks.
- The City of Refuge (The Memphis Cycle, Book 1).
- Related Research Unit(s).
- Human Development in the Twenty-First Century: Visionary Ideas from Systems Scientists.
- Pediatric Anesthesia Practice.
The location of the lock manager is based upon data distribution and replication. In a centralized system, timestamp of any transaction is determined by the physical clock reading.
Distributed Databases in Real-Time Control, Volume 6
For implementing timestamp ordering algorithms, each site has a scheduler that maintains a separate queue for each transaction manager. The scheduler puts the request to the corresponding queue in increasing timestamp order. Requests are processed from the front of the queues in the order of their timestamps, i. Another method is to create conflict graphs. For this transaction classes are defined.
A transaction class contains two set of data items called read set and write set. In the read phase, each transaction issues its read requests for the data items in its read set. In the write phase, each transaction issues its write requests.