TSR Wiki > Study Help > Subjects and Revision > Revision Notes > ICT > OCR A2 ICT - Distributed Databases
Distributed Databases
Definition
- A single database that is spread physically across computers in multiple locations that are connected by data communication links (database stored in more than one place)
Why have distributed databases?
- Allows local business units to have control over data
- Allows data in local databases to be used together for decision making based upon the entire dataset
Reduce telecommunications costs by using the local database, rather than a distant option
Reduces the risk of telecommunications failures having a major impact, as less telecommunications hardware used
The database is spread across different sites
Each remote site has the data that is relevant to itself only
There are 3 core types:
- Partitioned between sites
- Entire databases duplicated at each site
- Central database with remote local database
==Partitioned between sites
- Not every location (node) needs to have all the data. Therefore the partitioned approach is giving each node the data that is relevant to itself.
- If data is required by the node that is not held at the local database then a request for the data can be sent through the central computer (which holds a copy of all data or can link to it).
- The central copy is updated during periods where the load on the database is less – in general this is overnight.
- The data is split between sites either:
- Vertically : different columns of a table located at different sites
- E.g. stock descriptions (country of origin, supplier name at one site and prices at another site)
- Horizontally : different records/rows of a table located at different sites
- E.g. departments of a supermarket, fruit and veg at one site, dairy products at another site
Advantages
- Data stored close to where it is used leading to increase in efficiency
- Local access optimization leading to better performance
- Only relevant data is available which leads to better security
Disadvantages
- Accessing data across partitions (different sites) leads to inconsistent access speed
- No data replication makes backups essential
- Potential exists for inconsistency in the data stored
- Additional disadvantage for vertical:
- combining data across partitions is more difficult because it requires joins (more complex than joining horizontally split data)
Entire databases duplicated at each site
- Instead of holding only the data that is relevant at each node, copies of the entire database are held at each node.
- There is a problem with data integrity – Assume that node B updates record 1 locally. Node C also updates record 1 locally but after node B – hence we can assume that the node C data is more up to date and therefore more correct.
- This is solved by effective record locking and effective database management software to control access to the data.
- Hardware requirements are heavy as each node needs enough equipment to be able to handle the entire database.
Central database with remote local databases
- No data is held at the local node, instead an index is held locally and this is used to find and then access the data is in the central database.
- Indexes are the key data used to search the main database. Re-sorting an index into order when data is changed takes time, but a sorted index allows for fast searching of data
- Very little hardware is required at the nodes, but the indexes need updating. In this method there is a lot of network traffic.
- A ‘light’ alternative is to store the databases relevant to individual sites at that site, with an index being given to all databases
- When data is required, the index gives the location of the data – this is not a central location but the location of the site that holds the required database
Advantages and Disadvantages of Methods 2 and 3
Advantages and disadvantages
- Centralised database is useful for statistical analysis (e.g. sales figures) and backup
- A distributed database may be less secure with more points of access for hackers
- Decentralising increases complexity but reduces network traffic
- Poor record locking and DBMS causes data reliability/integrity problems
Implementing Distributed Database Systems
Advantages (1)
- Organizational structure
- Breaks the network down with greater control over local access
- Security
- Can readily limit read/write access to different areas of the database
- Local autonomy
- Each local area is responsible for maintenance of its database/access (could be a disadvantage if one site is not up to scratch)
- Errors are simpler to correct at a local level than a national level
- Improved availability (not all or nothing)
- Give access to parts of the database as required (some parts just require longer access time)
- Improved reliability (replication)
- Faster performance when held locally
- if one node breaks down the rest can function (against the loss of the central node).
- Economics
- Smaller computers required at central and local nodes (assuming non full replication)
- Less transmission cost as traffic is reduced (except index updates)
- Modular Growth
- easier to create new local nodes (anywhere in the world with satellite or undersea communications)
Disadvantages
- Complexity
- To maintain indexes, locations, updating, etc is complex
- Cost
- Often requires processing and storage at each site
- permanent high speed links needed between sites
- Security
- Many locations and entry points to the system that need to be accounted for
- Integrity control
- Maintenance of data integrity needs to be maintained – it must not be possible to have one record updated in two sites at the same time
Also See
Comments