Revision:Ocr a2 ict - distributed databases

Distributed Databases

Definition

  • A single database that is spread physically across computers in multiple locations that are connected by data communication links (database stored in more than one place)


Why have distributed databases?

  • Allows local business units to have control over data
  • Allows data in local databases to be used together for decision making based upon the entire dataset


Reduce telecommunications costs by using the local database, rather than a distant option


Reduces the risk of telecommunications failures having a major impact, as less telecommunications hardware used


The database is spread across different sites

Each remote site has the data that is relevant to itself only

There are 3 core types:

  • Partitioned between sites
  • Entire databases duplicated at each site
  • Central database with remote local database


==Partitioned between sites

  • Not every location (node) needs to have all the data. Therefore the partitioned approach is giving each node the data that is relevant to itself.
  • If data is required by the node that is not held at the local database then a request for the data can be sent through the central computer (which holds a copy of all data or can link to it).
  • The central copy is updated during periods where the load on the database is less – in general this is overnight.
  • The data is split between sites either:
    • Vertically : different columns of a table located at different sites
      • E.g. stock descriptions (country of origin, supplier name at one site and prices at another site)
    • Horizontally : different records/rows of a table located at different sites
      • E.g. departments of a supermarket, fruit and veg at one site, dairy products at another site


Advantages

  • Data stored close to where it is used leading to increase in efficiency
  • Local access optimization leading to better performance
  • Only relevant data is available which leads to better security

Disadvantages

  • Accessing data across partitions (different sites) leads to inconsistent access speed
  • No data replication makes backups essential
  • Potential exists for inconsistency in the data stored
  • Additional disadvantage for vertical:
    • combining data across partitions is more difficult because it requires joins (more complex than joining horizontally split data)

Entire databases duplicated at each site

  • Instead of holding only the data that is relevant at each node, copies of the entire database are held at each node.
  • There is a problem with data integrity – Assume that node B updates record 1 locally. Node C also updates record 1 locally but after node B – hence we can assume that the node C data is more up to date and therefore more correct.
  • This is solved by effective record locking and effective database management software to control access to the data.
  • Hardware requirements are heavy as each node needs enough equipment to be able to handle the entire database.

Central database with remote local databases

  • No data is held at the local node, instead an index is held locally and this is used to find and then access the data is in the central database.
  • Indexes are the key data used to search the main database. Re-sorting an index into order when data is changed takes time, but a sorted index allows for fast searching of data
  • Very little hardware is required at the nodes, but the indexes need updating. In this method there is a lot of network traffic.
  • A ‘light’ alternative is to store the databases relevant to individual sites at that site, with an index being given to all databases
  • When data is required, the index gives the location of the data – this is not a central location but the location of the site that holds the required database

Advantages and Disadvantages of Methods 2 and 3

Advantages and disadvantages

  • Centralised database is useful for statistical analysis (e.g. sales figures) and backup
  • A distributed database may be less secure with more points of access for hackers
  • Decentralising increases complexity but reduces network traffic
  • Poor record locking and DBMS causes data reliability/integrity problems

Implementing Distributed Database Systems

Advantages (1)

  • Organizational structure
  • Breaks the network down with greater control over local access
  • Security
  • Can readily limit read/write access to different areas of the database
  • Local autonomy
  • Each local area is responsible for maintenance of its database/access (could be a disadvantage if one site is not up to scratch)
  • Errors are simpler to correct at a local level than a national level
  • Improved availability (not all or nothing)
  • Give access to parts of the database as required (some parts just require longer access time)

  • Improved reliability (replication)
  • Faster performance when held locally
    • if one node breaks down the rest can function (against the loss of the central node).
  • Economics
  • Smaller computers required at central and local nodes (assuming non full replication)
  • Less transmission cost as traffic is reduced (except index updates)
  • Modular Growth
    • easier to create new local nodes (anywhere in the world with satellite or undersea communications)


Disadvantages

  • Complexity
  • To maintain indexes, locations, updating, etc is complex
  • Cost
  • Often requires processing and storage at each site
    • permanent high speed links needed between sites
  • Security
  • Many locations and entry points to the system that need to be accounted for
  • Integrity control
  • Maintenance of data integrity needs to be maintained – it must not be possible to have one record updated in two sites at the same time

Comments