Data grid
{{Short description|Set of services used to access, modify and transfer geographical data}}
File:High Level View Data Grid V1.jpg
A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes.Allcock, Bill; Chervenak, Ann; Foster, Ian; et al. Data Grid tools: enabling science on big distributed data Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request.
The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data.Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri. A taxonomy of data grids for distributed data sharing - management and processing p.37 Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas.Shorfuzzaman, Mohammad; Graham, Peter; Eskicioglu, Rasit. Adaptive replica placement in hierarchical data grids. p.15 Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible.
Middleware
Middleware provides all the services and applications necessary for efficient management of datasets and files within the data grid while providing users quick access to the datasets and files.Padala, Pradeep. A survey of data middleware for Grid systems p.1 There is a number of concepts and tools that must be available to make a data grid operationally viable. However, at the same time not all data grids require the same capabilities and services because of differences in access requirements, security and location of resources in comparison to users. In any case, most data grids will have similar middleware services that provide for a universal name space, data transport service, data access service, data replication and resource management service. When taken together, they are key to the data grids functional capabilities.
=Universal namespace=
Since sources of data within the data grid will consist of data from multiple separate systems and networks using different file naming conventions, it would be difficult for a user to locate data within the data grid and know they retrieved what they needed based solely on existing physical file names (PFNs). A universal or unified name space makes it possible to create logical file names (LFNs) that can be referenced within the data grid that map to PFNs.Padala, Pradeep. A survey of data middleware for Grid systems When an LFN is requested or queried, all matching PFNs are returned to include possible replicas of the requested data. The end user can then choose from the returned results the most appropriate replica to use. This service is usually provided as part of a management system known as a Storage Resource Broker (SRB).Arcot, Rajasekar; Wan, Michael; Moore, Reagan; Schroeder, Wayne; Kremenek. Storage resource broker – managing distributed data in a grid Information about the locations of files and mappings between the LFNs and PFNs may be stored in a metadata or replica catalogue.Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri. A taxonomy of data grids for distributed data sharing - management and processing p.11 The replica catalogue would contain information about LFNs that map to multiple replica PFNs.
=Data transport service=
Another middleware service is that of providing for data transport or data transfer. Data transport will encompass multiple functions that are not just limited to the transfer of bits, to include such items as fault tolerance and data access.Coetzee, Serena. Reference model for a data grid approach to address data in a dynamic SDI p.16 Fault tolerance can be achieved in a data grid by providing mechanisms that ensures data transfer will resume after each interruption until all requested data is received.Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri. A taxonomy of data grids for distributed data sharing - management and processing p.21 There are multiple possible methods that might be used to include starting the entire transmission over from the beginning of the data to resuming from where the transfer was interrupted. As an example, GridFTP provides for fault tolerance by sending data from the last acknowledged byte without starting the entire transfer from the beginning.
The data transport service also provides for the low-level access and connections between hosts for file transfer.Allcock, Bill; Foster,Ian; Nefedova, Veronika; Chervenak, Ann; Deelman, Ewa; Kesselman, Carl. High-performance remote access to climate simulation data: A challenge problem for data grid technologies. The data transport service may use any number of modes to implement the transfer to include parallel data transfer where two or more data streams are used over the same channel or striped data transfer where two or more steams access different blocks of the file for simultaneous transfer to also using the underlying built-in capabilities of the network hardware or specifically developed protocols to support faster transfer speeds.Izmailov, Rauf; Ganguly, Samrat; Tu, Nan. Fast parallel file replication in data grid p.2 The data transport service might optionally include a network overlay function to facilitate the routing and transfer of data as well as file I/O functions that allow users to see remote files as if they were local to their system. The data transport service hides the complexity of access and transfer between the different systems to the user so it appears as one unified data source.
=Data access service=
Data access services work hand in hand with the data transfer service to provide security, access controls and management of any data transfers within the data grid.Raman, Vijayshankar; Narang, Inderpal; Crone, chris; Hass, Laura; Malaika, Susan. Services for data access and data processing on grids Security services provide mechanisms for authentication of users to ensure they are properly identified. Common forms of security for authentication can include the use of passwords or Kerberos (protocol). Authorization services are the mechanisms that control what the user is able to access after being identified through authentication. Common forms of authorization mechanisms can be as simple as file permissions. However, need for more stringent controlled access to data is done using Access Control Lists (ACLs), Role-Based Access Control (RBAC) and Tasked-Based Authorization Controls (TBAC).Thomas, R. K. and Sandhu R. S. Task-based authorization controls (tbac): a family of models for active and enterprise-oriented authorization management These types of controls can be used to provide granular access to files to include limits on access times, duration of access to granular controls that determine which files can be read or written to. The final data access service that might be present to protect the confidentiality of the data transport is encryption.Sreelatha, Malempati. Grid based approach for data confidentiality. p.1 The most common form of encryption for this task has been the use of SSL while in transport. While all of these access services operate within the data grid, access services within the various administrative domains that host the datasets will still stay in place to enforce access rules. The data grid access services must be in step with the administrative domains access services for this to work.
=Data replication service=
To meet the needs for scalability, fast access and user collaboration, most data grids support replication of datasets to points within the distributed storage architecture.Chervenak, Ann; Schuler, Robert; Kesselman, Carl; Koranda, Scott; Moe, Brian. Wide area data replication for scientific collaborations The use of replicas allows multiple users faster access to datasets and the preservation of bandwidth since replicas can often be placed strategically close to or within sites where users need them. However, replication of datasets and creation of replicas is bound by the availability of storage within sites and bandwidth between sites. The replication and creation of replica datasets is controlled by a replica management system. The replica management system determines user needs for replicas based on input requests and creates them based on availability of storage and bandwidth.Lamehamedi, Houda; Szymanski, Boleslaw; Shentu, Zujun; Deelman, Ewa. Data replication strategies in grid environments All replicas are then cataloged or added to a directory based on the data grid as to their location for query by users. In order to perform the tasks undertaken by the replica management system, it needs to be able to manage the underlying storage infrastructure. The data management system will also ensure the timely updates of changes to replicas are propagated to all nodes.
==Replication update strategy==
There are a number of ways the replication management system can handle the updates of replicas. The updates may be designed around a centralized model where a single master replica updates all others, or a decentralized model, where all peers update each other. The topology of node placement may also influence the updates of replicas. If a hierarchy topology is used then updates would flow in a tree like structure through specific paths. In a flat topology it is entirely a matter of the peer relationships between nodes as to how updates take place. In a hybrid topology consisting of both flat and hierarchy topologies updates may take place through specific paths and between peers.
==Replication placement strategy==
There are a number of ways the replication management system can handle the creation and placement of replicas to best serve the user community. If the storage architecture supports replica placement with sufficient site storage, then it becomes a matter of the needs of the users who access the datasets and a strategy for placement of replicas.Padala, Pradeep. A survey of data middleware for Grid systems There have been numerous strategies proposed and tested on how to best manage replica placement of datasets within the data grid to meet user requirements. There is not one universal strategy that fits every requirement the best. It is a matter of the type of data grid and user community requirements for access that will determine the best strategy to use. Replicas can even be created where the files are encrypted for confidentiality that would be useful in a research project dealing with medical files.Kranthi, G. and Rekha, D. Shashi. Protected data objects replication in data grid p.40 The following section contains several strategies for replica placement.
===Dynamic replication===
Dynamic replication is an approach to placement of replicas based on popularity of the data.Belalem, Ghalem and Meroufel, Bakhta. Management and placement of replicas in a hierarchical data grid The method has been designed around a hierarchical replication model. The data management system keeps track of available storage on all nodes. It also keeps track of requests (hits) for which data clients (users) in a site are requesting. When the number of hits for a specific dataset exceeds the replication threshold it triggers the creation of a replica on the server that directly services the user’s client. If the direct servicing server known as a father does not have sufficient space, then the father’s father in the hierarchy is then the target to receive a replica and so on up the chain until it is exhausted. The data management system algorithm also allows for the dynamic deletion of replicas that have a null access value or a value lower than the frequency of the data to be stored to free up space. This improves system performance in terms of response time, number of replicas and helps load balance across the data grid. This method can also use dynamic algorithms that determine whether the cost of creating the replica is truly worth the expected gains given the location.
===Adaptive replication===
This method of replication like the one for dynamic replication has been designed around a hierarchical replication model found in most data grids. It works on a similar algorithm to dynamic replication with file access requests being a prime factor in determining which files should be replicated. A key difference, however, is the number and frequency of replica creations is keyed to a dynamic threshold that is computed based on request arrival rates from clients over a period of time.Shorfuzzaman, Mohammad; Graham, Peter; Eskicioglu, Rasit. Adaptive replica placement in hierarchical data grids If the number of requests on average exceeds the previous threshold and shows an upward trend, and storage utilization rates indicate capacity to create more replicas, more replicas may be created. As with dynamic replication, the removal of replicas that have a lower threshold that were not created in the current replication interval can be removed to make space for the new replicas.
===Other replication===
The above three replica strategies are but three of many possible replication strategies that may be used to place replicas within the data grid where they will improve performance and access. Below are some others that have been proposed and tested along with the previously described replication strategies.Ranganathan, Kavitha and Foster, Ian. Identifying dynamic replication strategies for a high performance data grid
- Static – uses a fixed replica set of nodes with no dynamic changes to the files being replicated.
- Best Client – Each node records number of requests per file received during a preset time interval; if the request number exceeds the set threshold for a file a replica is created on the best client, one that requested the file the most; stale replicas are removed based on another algorithm.
- Cascading – Is used in a hierarchical node structure where requests per file received during a preset time interval is compared against a threshold. If the threshold is exceeded a replica is created at the first tier down from the root, if the threshold is exceeded again a replica is added to the next tier down and so on like a waterfall effect until a replica is placed at the client itself.
- Plain Caching – If the client requests a file it is stored as a copy on the client.
- Caching plus Cascading – Combines two strategies of caching and cascading.
- Fast Spread – Also used in a hierarchical node structure this strategy automatically populates all nodes in the path of the client that requests a file.
=Tasks scheduling and resource allocation=
Such characteristics of the data grid systems as large scale and heterogeneity require specific methods of tasks scheduling and resource allocation. To resolve the problem, majority of systems use extended classic methods of scheduling.Epimakhov, Igor; Hameurlain, Abdelkader ; Dillon, Tharam; Morvan, Franck. Resource Scheduling Methods for Query Optimization in Data Grid Systems Others invite fundamentally different methods based on incentives for autonomous nodes, like virtual money or reputation of a node.
Another specificity of data grids, dynamics, consists in the continuous process of connecting and disconnecting of nodes and local load imbalance during an execution of tasks. That can make obsolete or non-optimal results of initial resource allocation for a task. As a result, much of the data grids utilize execution-time adaptation techniques that permit the systems to reflect to the dynamic changes: balance the load, replace disconnecting nodes, use the profit of newly connected nodes, recover a task execution after faults.
=Resource management system (RMS)=
The resource management system represents the core functionality of the data grid. It is the heart of the system that manages all actions related to storage resources. In some data grids it may be necessary to create a federated RMS architecture because of different administrative policies and a diversity of possibilities found within the data grid in place of using a single RMS. In such a case the RMSs in the federation will employ an architecture that allows for interoperability based on an agreed upon set of protocols for actions related to storage resources.Krauter, Klaus; Buyya, Rajkumar; Maheswaran, Muthucumaru. A taxonomy and survey of grid resource management systems for distributed computing
==RMS functional capabilities==
- Fulfillment of user and application requests for data resources based on type of request and policies; RMS will be able to support multiple policies and multiple requests concurrently
- Scheduling, timing and creation of replicas
- Policy and security enforcement within the data grid resources to include authentication, authorization and access
- Support systems with different administrative policies to inter-operate while preserving site autonomy
- Support quality of service (QoS) when requested if feature available
- Enforce system fault tolerance and stability requirements
- Manage resources, i.e. disk storage, network bandwidth and any other resources that interact directly or as part of the data grid
- Manage trusts concerning resources in administrative domains, some domains may place additional restrictions on how they participate requiring adaptation of the RMS or federation.
- Supports adaptability, extensibility, and scalability in relation to the data grid.
Topology
File:Data Grid Multiple Topologies 1.jpg
Data grids have been designed with multiple topologies in mind to meet the needs of the scientific community. On the right are four diagrams of various topologies that have been used in data grids.Zhu, Lichun. Metadata management in grid database federation Each topology has a specific purpose in mind for where it will be best utilized. Each of these topologies is further explained below.
Federation topology is the choice for institutions that wish to share data from already existing systems. It allows each institution control over their data. When an institution with proper authorization requests data from another institution it is up to the institution receiving the request to determine if the data will go to the requesting institution. The federation can be loosely integrated between institutions, tightly integrated or a combination of both.
Monadic topology has a central repository that all collected data is fed into. The central repository then responds to all queries for data. There are no replicas in this topology as compared to others. Data is only accessed from the central repository which could be by way of a web portal. One project that uses this data grid topology is the Network for Earthquake Engineering Simulation (NEES) in the United States.Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri. A taxonomy of data grids for distributed data sharing - management and processing p.16 This works well when all access to the data is local or within a single region with high speed connectivity.
Hierarchical topology lends itself to collaboration where there is a single source for the data and it needs to be distributed to multiple locations around the world. One such project that will benefit from this topology would be CERN that runs the Large Hadron Collider that generates enormous amounts of data. This data is located at one source and needs to be distributed around the world to organizations that are collaborating in the project.
Hybrid Topology is simply a configuration that contains an architecture consisting of any combination of the previous mentioned topologies. It is used mostly in situations where researchers working on projects want to share their results to further research by making it readily available for collaboration.
History
The need for data grids was first recognized by the scientific community concerning climate modeling, where terabyte and petabyte sized data sets were becoming the norm for transport between sites. More recent research requirements for data grids have been driven by the Large Hadron Collider (LHC) at CERN, the Laser Interferometer Gravitational Wave Observatory (LIGO), and the Sloan Digital Sky Survey (SDSS). These examples of scientific instruments produce large amounts of data that need to be accessible by large groups of geographically dispersed researchers.Allcock, Bill; Chervenak, Ann; Foster, Ian; et al. p.571Tierney, Brian L. Data grids and data grid performance issues. p.7 Other uses for data grids involve governments, hospitals, schools and businesses where efforts are taking place to improve services and reduce costs by providing access to dispersed and separate data systems through the use of data grids.Thibodeau, P. Governments plan data grid projects
From its earliest beginnings, the concept of a Data Grid to support the scientific community was thought of as a specialized extension of the “grid” which itself was first envisioned as a way to link super computers into meta-computers.Heingartner, douglas. The grid: the next-gen internet However, that was short lived and the grid evolved into meaning the ability to connect computers anywhere on the web to get access to any desired files and resources, similar to the way electricity is delivered over a grid by simply plugging in a device. The device gets electricity through its connection and the connection is not limited to a specific outlet. From this the data grid was proposed as an integrating architecture that would be capable of delivering resources for distributed computations. It would also be able to service numerous to thousands of queries at the same time while delivering gigabytes to terabytes of data for each query. The data grid would include its own management infrastructure capable of managing all aspects of the data grids performance and operation across multiple wide area networks while working within the existing framework known as the web.Heingartner, douglas. The grid: the next-gen internet
The data grid has also been defined more recently in terms of usability; what must a data grid be able to do in order for it to be useful to the scientific community. Proponents of this theory arrived at several criteria.Venugopal, Srikumar; Buyya, Rajkumar; Ramamohanarao, Kotagiri. A taxonomy of data grids for distributed data sharing - management and processing p.1 One, users should be able to search and discover applicable resources within the data grid from amongst its many datasets. Two, users should be able to locate datasets within the data grid that are most suitable for their requirement from amongst numerous replicas. Three, users should be able to transfer and move large datasets between points in a short amount of time. Four, the data grid should provide a means to manage multiple copies of datasets within the data grid. And finally, the data grid should provide security with user access controls within the data grid, i.e. which users are allowed to access which data.
The data grid is an evolving technology that continues to change and grow to meet the needs of an expanding community. One of the earliest programs begun to make data grids a reality was funded by the Defense Advanced Research Projects Agency (DARPA) in 1997 at the University of Chicago.Globus. About the globus toolkit This research spawned by DARPA has continued down the path to creating open source tools that make data grids possible. As new requirements for data grids emerge projects like the Globus Toolkit will emerge or expand to meet the gap. Data grids along with the "Grid" will continue to evolve.
Notes
{{Reflist}}
References
- {{cite journal
|last1= Allcock
|first1= Bill |last2= Chervenak |first2= Ann |last3= Foster |first3= Ian |last4= Kesselman |first4= Carl |last5= Livny |first5= Miron
|year= 2005
|title= Data Grid tools: enabling science on big distributed data
|journal= Journal of Physics: Conference Series
|volume= 16
|issue= 1 |pages= 571–575
|doi= 10.1088/1742-6596/16/1/079
|citeseerx= 10.1.1.379.4325|bibcode= 2005JPhCS..16..571A |s2cid= 250673712 }}
- {{cite journal
|last1=Allcock
|first1=Bill |last2=Foster |first2=Ian |last3=Nefedova |first3= Veronika l|last4= Chervenak |first4= Ann |last5= Deelman |first5= Ewa | author5-link = Ewa Deelman |last6= Kesselman |first6= Carl |last7= Lee |first7= Jason |last8= Sim |first8= Alex |last9= Shoshani |first9= Arie |last10= Drach |first10=Bob |last11= Williams |first11= Dean
|title= High-performance remote access to climate simulation data: A challenge problem for data grid technologies
|publisher = ACM Press
|year = 2001
|citeseerx = 10.1.1.64.6603
|access-date = }}
- {{cite web
|last1=Arcot
|first1=Rajasekar
|last2=Wan
|first2=Michael
|last3=Moore
|first3=Reagan
|last4=Schroeder
|first4=Wayne
|last5=Kremenek
|first5=George
|title=Storage resource broker – managing distributed data in a grid
|url=http://www.npaci.edu/DICE/Pubs/CSI-paper-sent.doc
|access-date=April 28, 2012
|url-status=dead
|archive-url=https://web.archive.org/web/20060507193028/http://www.npaci.edu/DICE/Pubs/CSI-paper-sent.doc
|archive-date=May 7, 2006
}}
- {{cite journal
|last1= Belalem
|first1= Ghalem |last2= Meroufel |first2= Bakhta
|year= 2011
|title= Management and placement of replicas in a hierarchical data grid
|journal= International Journal of Distributed and Parallel Systems
|volume= 2
|issue= 6
|pages= 23–30
|doi= 10.5121/ijdps.2011.2603
|url= https://www.scribd.com/doc/75105419/Management-and-Placement-of-Replicas-in-a-Hierarchical-Data-Grid
|access-date= April 28, 2012|doi-access= free
}}
- {{cite journal
|last1= Chervenak
|first1= A.|last2= Foster |first2= I. |last3= Kesselman |first3= C.|last4= Salisbury |first4= C. |last5= Tuecke |first5= S.
|year= 2001
|title= The data grid: towards an architecture for the distributed management and analysis of large scientific datasets
|journal= Journal of Network and Computer Applications
|volume= 23
|issue= 3
|pages= 187–200
|doi= 10.1006/jnca.2000.0110
|url= http://www.globus.org/alliance/publications/papers/JNCApaper.pdf
|access-date= April 11, 2012|citeseerx= 10.1.1.32.6963}}
- {{cite web
|last1= Chervenak
|first1= Ann |last2= Schuler |first2= Robert |last3= Kesselman | first3= Carl |last4= Koranda |first4= Scott |last5= Moe |first5= Brian
|title= Wide area data replication for scientific collaborations
|publisher = IEEE
|date = November 14, 2005
|url= http://www.globus.org/alliance/publications/papers/chervenakGrid2005.pdf
| access-date = April 25, 2012 }}
- {{cite journal
|last1=Coetzee
|first1=Serena
|year=2012
|title=Reference model for a data grid approach to address data in a dynamic SDI
|journal=GeoInformatica
|volume=16
|issue=1
|pages=111–129
|doi=10.1007/s10707-011-0129-4
|hdl=2263/18263
|s2cid=19837152
|hdl-access=free
}}
- {{cite conference
|last1= Epimakhov
|first1= Igor
|last2= Hameurlain
|first2= Abdelkader
|last3= Dillon
|first3= Tharam
|last4= Morvan
|first4= Franck
|title= Resource Scheduling Methods for Query Optimization in Data Grid Systems
|book-title = Advances in Databases and Information Systems. 15th International Conference, ADBIS 2011
|pages = 185–199
|publisher = Springer Berlin Heidelberg
|year = 2011
|location = Vienna, Austria
|doi = 10.1007/978-3-642-23737-9_14
}}
- {{cite web
|last1= Globus
|title= About the globus toolkit
|publisher= Globus
|year= 2012
|url= http://www.globus.org/toolkit/about.html
|access-date = May 27, 2012 }}
- {{cite magazine
|last1=Heingartner
|first1=Douglas
|title=The Grid: The Next-Gen Internet
|magazine=Wired
|date=March 8, 2001
|url=https://www.wired.com/science/discoveries/news/2001/03/42230
|access-date=May 13, 2012
|url-status=dead
|archive-url=https://web.archive.org/web/20120504035536/http://www.wired.com/science/discoveries/news/2001/03/42230
|archive-date=May 4, 2012
}}
- {{cite web
|last1=Izmailov
|first1=Rauf
|last2=Ganguly
|first2=Samrat
|last3=Tu
|first3=Nan
|title=Fast parallel file replication in data grid
|year=2004
|url=http://www.cs.huji.ac.il/labs/danss/p2p/resources/fast-parallel-file-replication-on-data-grid.pdf
|access-date=May 10, 2012
|url-status=dead
|archive-url=https://web.archive.org/web/20120421081052/http://www.cs.huji.ac.il/labs/danss/p2p/resources/fast-parallel-file-replication-on-data-grid.pdf
|archive-date=April 21, 2012
}}
- {{cite journal
|last1=Kranthi
|first1=G. Aruna
|last2=Rekha
|first2=D. Shashi
|year=2012
|title=Protected data objects replication in data grid
|journal=International Journal of Network Security & Its Applications
|volume=4
|issue=1
|pages=29–41
|doi=10.5121/ijnsa.2012.4103
|issn=0975-2307
|doi-access=free
}}
- {{cite journal
|last1= Krauter
|first1= Klaus |last2= Buyya |first2= Rajkumar |last3= Maheswaran |first3= Muthucumaru
|year= 2002
|title= A taxonomy and survey of grid resource management systems for distributed computing
|journal= Software: Practice and Experience
|volume= 32
|issue= 2
|pages= 135–164
|doi=10.1002/spe.432
|citeseerx = 10.1.1.38.2122
|s2cid= 816774 |access-date= }}
- {{cite conference
|last1= Lamehamedi
|first1= Houda |last2= Szymanski |first2= Boleslaw |last3= Shentu |first3= Zujun |last4= Deelman |first4= Ewa |author4-link=Ewa Deelman
|title = Data replication strategies in grid environments
|book-title = Fifth International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02)
|pages = 378–383
|publisher = Press
|year = 2002
|citeseerx = 10.1.1.11.5473
|access-date = }}
- {{cite journal
|last1= Padala
|first1= Pradeep
|title= A survey of data middleware for Grid systems
|citeseerx = 10.1.1.114.1901
|access-date = }}
- {{cite web
|last1= Raman
|first1= Vijayshankar |last2= Narang |first2= Inderpal |last3= Crone |first3= Chris |last4= Hass |first4= Laura |last5= Malaika |first5= Susan
|title= Services for data access and data processing on grids
|date = February 9, 2003
|url = http://www.ogf.org/documents/GFD.14.pdf
|access-date = May 10, 2012 }}
- {{cite conference
|last1= Ranganathan
|first1= Kavitha | last2= Foster |first2= Ian
|title= Identifying dynamic replication strategies for a high performance data grid
|book-title= Proc. of the International Grid Computing Workshop
|pages= 75–86
|year= 2001
|citeseerx = 10.1.1.20.6836
|doi= 10.1007/3-540-45644-9_8
|access-date = }}
- {{cite journal
|last1= Shorfuzzaman
|first1= Mohammad |last2= Graham |first2= Peter |last3= Eskicioglu |first3= Rasit
|year= 2010
|title= Adaptive replica placement in hierarchical data grids
|journal= Journal of Physics: Conference Series
|volume= 256
|issue= 1
|pages= 1–18
|doi= 10.1088/1742-6596/256/1/012020
|bibcode= 2010JPhCS.256a2020S |doi-access= free
}}
- {{cite journal
|last1= Sreelatha
|first1= Malempati
|year= 2011
|title= Grid based approach for data confidentiality
|journal= International Journal of Computer Applications
|volume= 25
|issue= 9
|pages= 1–5
|doi= 10.5120/3063-4186
|issn = 0975-8887
|bibcode= 2011IJCA...25i...1M|citeseerx= 10.1.1.259.4326
}}
- {{cite journal
|last1= Thibodeau
|first1=P.
|title= Governments plan data grid projects
|journal= Computerworld
|volume= 39
|issue= 42
|pages= 14
|date = May 30, 2005
|url = http://www.computerworld.com/s/article/102119/Governments_Plan_Data_Grid_Projects
|issn= 0010-4841
|access-date = April 28, 2012 }}
- {{cite web
|last1= Thomas
|first1= R. K. |last2= Sandhu |first2= R. S.
|title= Task-based authorization controls (tbac): a family of models for active and enterprise-oriented authorization management
|year = 1997
|url = http://profsandhu.com/confrnc/ifip/i97tbac.pdf
|access-date = April 28, 2012 }}
- {{cite web
|last1=Tierney
|first1=Brian L.
|title= Data grids and data grid performance issues
|year = 2000
|url = http://www-didc.lbl.gov/presentations/CSC2000-tierney.pdf
|access-date = April 28, 2012 }}
- {{cite journal
|last1= Venugopal
|first1= Srikumar |last2= Buyya |first2= Rajkumar |last3= Ramamohanarao |first3= Kotagiri
|year= 2006
|title= A taxonomy of data grids for distributed data sharing, management and processing
|journal= ACM Computing Surveys
|volume= 38
|issue= 1
|pages= 1–60
|doi=10.1145/1132952.1132955
|url= http://www.cloudbus.org/reports/DataGridTaxonomy.pdf
|access-date= April 10, 2012|arxiv= cs/0506034|citeseerx= 10.1.1.59.6924 |s2cid= 1379579 }}
- {{cite web
|last1=Zhu
|first1=Lichun
|title=Metadata management in grid database federation
|url=http://cs.uwindsor.ca/richard/cs510/lichun_zhu_survey.pdf
|access-date=May 15, 2012
}}{{dead link|date=December 2016 |bot=InternetArchiveBot |fix-attempted=yes }}
Further reading
- {{cite web
|last1= Allcock
|first1= W.
|author-link = W. Allcock
|title= Gridftp: protocol extensions to ftp for the grid
|publisher = Argonne National Laboratory
|date = April 2003
|url = http://www.globus.org/alliance/publications/papers/GFD-R.0201.pdf
| access-date = April 20, 2012 }}
- {{cite web
|last1= Allcock
|first1= W.|last2= Bresnahan |first2= J. |last3= Kettimuthu |first3= R.|last4= Link |first4= M.|last5= Dumitrescu |first5= C.|last6= Raicu |first6= I.|last7= Foster |first7= I.
|title= The globus striped gridftp framework and server
|publisher= ACM Press
|date= November 2005
|url= http://www.globus.org/alliance/publications/papers/gridftp_final.pdf
|access-date = April 20, 2012 }}
- {{cite journal
|last1= Foster
|first1= Ian |last2= Kesselman |first2= Carl |last3= Tuecke |first3= Steven
|year= 2001
|title= The anatomy of the grid enabling scalable virtual organizations
|journal= International Journal of High Performance Computing Applications
|volume= 15
|issue= 3
|pages= 200–222
|doi=10.1177/109434200101500302
|url= http://www.globus.org/alliance/publications/papers/anatomy.pdf
|access-date= April 10, 2012|citeseerx= 10.1.1.24.9069 |bibcode= 2001cs........3025F |arxiv= cs/0103025 |s2cid= 28969310 }}
- {{cite web
|last1=Foster
|first1=Ian
|last2=Kesselman
|first2=Carl
|last3=Nick
|first3=Jeffrey M.
|last4=Tuecke
|first4=Steven
|title=The physiology of the grid: an open grid services architecture for distributed systems integration
|date=June 22, 2002
|url=http://forge.gridforum.org/sf/go/doc13483?nav=1
|access-date=May 10, 2012
|url-status=dead
|archive-url=https://web.archive.org/web/20080322035911/http://forge.gridforum.org/sf/go/doc13483?nav=1
|archive-date=March 22, 2008
}}
- {{cite journal
|last1= Hancock
|first1= B.
|year= 2009
|title= A simple data grid using the inferno operating system
|journal= Library Hi Tech
|volume= 27
|issue= 3
|pages= 382–392
|doi= 10.1108/07378830910988513
}}
- {{cite web
|last1=Hoschek
|first1=W.
|last2=McCance
|first2=G.
|title=Grid enabled relational database middleware
|publisher=Global Grid Forum
|date=October 10, 2001
|url=http://ppewww.ph.gla.ac.uk/preprints/2001/11/GGF3Rome2001.pdf
|access-date=April 22, 2012
|url-status=dead
|archive-url=https://web.archive.org/web/20060128234459/http://ppewww.ph.gla.ac.uk/preprints/2001/11/GGF3Rome2001.pdf
|archive-date=January 28, 2006
}}
- {{cite web
|last1= Kunszt
|first1= Peter Z.|last2= Guy |first2= Leanne P.
|title= The open grid services architecture and data grids
|date = July 7, 2002
|url = http://www.computing.surrey.ac.uk/courses/csm23/Papers/data_grid.pdf
|access-date = May 10, 2012 }}
- {{cite web
|last1= Moore
|first1= Reagan W.
|title= Evolution of data grid concepts
|url= http://www.nesc.ac.uk/events/GGF10-DA/programme/papers/06-Moore-Grid-evolution.pdf
|access-date= May 10, 2012
|archive-url= https://web.archive.org/web/20140212000149/http://www.nesc.ac.uk/events/GGF10-DA/programme/papers/06-Moore-Grid-evolution.pdf
|archive-date= February 12, 2014
|url-status= dead
}}
- {{cite conference
|last1= Rajkumar
|first1= Kettimuthu |last2= Allcock |first2= William |last3= Liming |first3= Lee |last4= Navarro |first4= John-Paul |last5= Foster |first5= Ian
| title = GridCopy moving data fast on the grid
| book-title = International parallel and distributed processing symposium (IPDPS 2007)
| pages = 1–6
| publisher = IEEE International
| date = March 30, 2007
| location = Long Beach
| url = http://www.globus.org/alliance/publications/papers/GridCopy.pdf
| access-date = April 29, 2012 }}
- {{cite journal
|last1= Thenmozhi
|first1= N. |last2= Madheswaran |first2= M.
|year= 2011
|title= Content based data transfer mechanism for efficient bulk data transfer in grid computing environment
|journal= International Journal of Grid Computing & Applications
|volume= 2
|issue= 4
|pages= 49–62
|doi= 10.5121/ijgca.2011.2405
|issn= 2229-3949
|url= https://www.scribd.com/doc/78611092/Content-Based-Data-Transfer-Mechanism-for-Efficient-Bulk-Data-Transfer-in-Grid-Computing-Environment
|access-date= April 28, 2012|doi-access= free
}}
- {{cite journal
|last1=Tu
|first1=Manghui
|last2=Li
|first2=Peng
|last3=I-Ling
|first3=Yen
|last4=Thuraisingham
|first4=Bhavani
|last5=Khan
|first5=Latifur
|year=2010
|title=Secure data objects replication in data grid
|journal=IEEE Transactions on Dependable and Secure Computing
|volume=7
|issue=1
|pages=50–64
|doi=10.1109/tdsc.2008.19
|s2cid=8934783
|url=http://www.utdallas.edu/~lkhan/papers/Secure_Data_Objects_Replication_in_Data_Grid.pdf
|access-date=April 26, 2012
}}{{dead link|date=December 2016 |bot=InternetArchiveBot |fix-attempted=yes }}