Prof Junzhou Luo AMS Mass Data Processing Grid

50 %
50 %
Information about Prof Junzhou Luo AMS Mass Data Processing Grid

Published on October 15, 2007

Author: Gulkund


AMS Mass Data Processing Grid:  AMS Mass Data Processing Grid Luo Junzhou School of Computer Science and Engineering Outline:  Outline Background of AMS experiment Mass data processing grid AMS data processing grid platform Related research work on SEUGrid Future work in AMS data processing What’s AMS Experiment (1):  What’s AMS Experiment (1) The AMS(Alpha Magnetic Spectrometer)experiment, led by Nobel Prize winner Professor Samuel C. C. Ting, is large-scale international collaborative project. More than 300 scientists from 15 countries and regions, including USA, Russia, Germany, France and China, participate in the AMS experiment. Among them, there are a lot of world-famous scholars and universities such as Massachusetts Institute of Technology, University of Geneva and University of Perugia, etc. What’s AMS Experiment (2):  What’s AMS Experiment (2) AMS experiment is the unique large physics experiment on the international space station. It is the first time for human being to measure accurately in space high-energy electric atom and particle. The purpose of the AMS experiment is to look for the source of the dark matter, source of the cosmic ray and the universe made of antimatter. AMS-02 Detector:  AMS-02 Detector AMS-02 Data Volume:  AMS-02 Data Volume STS91 ISS AMS-02 Data Classification:  AMS-02 Data Classification Health & Status Data Status of detector (magnet, power, temperature, DAQ state), Rate < 1 Kbit/sec, need in Real-Time (RT) to AMS Payload Operation and Control Center (POCC), to ISS crew and NASA ground Monitoring Data All slow control data from all slow control sensors, Data rate ~ 10 Kbit/sec, need in NearRealTime (NRT) to AMS POCC, complete copy “later” (close to NRT) for science analysis Science Data events, subdetector calibrations, samples approx. 10% to POCC to monitor detector performance in RT, complete copy “later” (close to NRT) to SOC for event reconstruction and physics analysis, 2 Mbit/sec orbit average Flight Ancillary Data ISS lattitude, altitude, speed, etc Rate 2 Kbit/sec Collaboration Between SEU & AMS:  Collaboration Between SEU & AMS Southeast University is the first mainland university participating in AMS under the approval of China government in Oct., 2003 According to the Collaboration Agreement between SEU and AMS-02 Experiment: Set up the SEU AMS experiment system on the ground (AMS-C); Set up the SEU AMS-02 Antimatter Investigation System (AMS-AIS); Set up the SEU AMS-02 Science Operator Center (AMS-SOC). SEU AMS-SOC:  SEU AMS-SOC The mission of SEU AMS-SOC: Mass data storage system Parallel computing environment Data analysis and computing system Set up a mass data storage system with the capability of more than 420TB, and a high performance data processing system based on the newly clusters and distributed computing technologies, which can meet the needs of real-time or near real-time large scale observational data processing and off-line physical analysis. Outline:  Outline Background of AMS experiment Mass data processing grid AMS data processing grid platform Related research work on SEUGrid Future work in AMS data processing Slide11:  Data Grid Overview (1) European DataGrid DataGrid is led by CERN, together with five other main partners and fifteen associated partners. Aim to enable access to geographically distributed computing power and storage facilities belonging to different institutions. Slide12:  Data Grid Overview (2) GriPhyN (Grid Physics Network) GriPhyN is developed by a team of experimental physicists and IT researchers who plan to implement the first Petabyte-scale computational environments for data intensive science. PPDG(Particle Physics Data Grid) PPDG is a collaboration of computer scientists with a strong record in distributed computing and Grid technology, and physicists with leading roles in the software and network infrastructures for major high-energy and nuclear experiments. Data Grid Overview (3) :  Data Grid Overview (3) DataTAG Providing high performance networking between Geneva in Switzerland and Chicago in U.S.(2.5Gbps leased line), and focusing on interoperability between these intercontinental Grid domains. iVDGL(International Virtual Data Grid Laboratory) A global Data Grid that will serve forefront experiments in physics and astronomy. Its computing, storage and networking resources in the US, Europe, Asia and South America, provide a unique laboratory that will test and validate Grid technologies at international scales. Slide14:  Specific application backgrounds High Energy Physics (HEP), led by CERN(Switzerland) Biology and Medical Image processing, led by CNRS (France), Earth Observations (EO), led by ESA/ESRIN (Italy) DataGrid Application Background DataGrid mainly aims to CERN’s High energy physics, solving mass data storage, partition and processing, and extends to EO and Bio Information processing. Applications, especially LHC application in the future, are basis of developing DataGrid. If resolving LHC application, the research about DataGrid will come to a strategic victory. DataGrid Hierarchy:  DataGrid Hierarchy Mass Data Processing in DataGrid:  The Particle Detector produces raw data at the magnitude of PB/s. After filtered by the on-line system, and processed by the off-line processors owning the capability of 20TIPS, the data will be written to tapes at the speed of 100MB/s at last .The data in tapes is processed by DataGrid indeed. CERN Computing Center is responsible for dispatching the data to Area Centers in Europe, North America, Japan by high-speed networks. The Area Centers makes further partition on such huge amounts of data and then the data stream will decrease to about 1MB/S when it reached physicist’s desktop, which can be processed easily. Mass Data Processing in DataGrid DataGrid Architecture:  DataGrid Architecture Outline:  Outline Background of AMS experiment Mass Data Processing Grid AMS data processing grid platform Related research work on SEUGrid Future work in AMS data processing Present hardware at SEU:  Present hardware at SEU School of Computer Science & Engineering Jiangsu Provincial Key Laboratory of Network and Information Security State key laboratory of microwave Campus network center Library information center CERNET Eastern China (North) Regional Network Center Connection to ChinaGrid Hardware List:  Hardware List Connection to ChinaGrid:  Connection to ChinaGrid Connection to ChinaGrid with 1 Giga routers and switchers ChinaGrid grid middleware CGSP deployed ChinaGrid:  China Education and Research Grid Funded by Ministry of Education Based on CERNET Fisrt Phase From 2003 to 2006 12 key universities as initiative More than 6Tflops w/60TB 20 key universities by the end of 2004 ChinaGrid ChinaGrid Members:  ChinaGrid Members ChinaGrid Main Tasks :  ChinaGrid Main Tasks Campus grid platform Common platform for ChinaGrid Grid application platform and representative grid applications Image processing grid Bioinformatics grid Course on-line grid Computational fluid dynamic grid Large scale information processing grid ChinaGrid Specific Application Grid:  ChinaGrid Specific Application Grid ChinaGrid Supporting Platform: CGSP:  ChinaGrid Supporting Platform: CGSP Grid Security CGSP in Details:  CGSP in Details 网格开发环境 Grid Portal Portal tookits GridPPI Resource pack tool Installation packs Manage GUI Job define toolkit Domain mngmnt Info service Service matadata management User mngmnt Policy negotiate virtual interface Service matadata Service match QoS mngmnt info collect Fault-tolerant moniter Data mngmnt Service container Unified file interface Data storage proxy SRB File Matadata mngmnt Replica policy Replica directory Job submit Service deploy Workflow mngmnt Job status monitor Job sched Job SLA mng Service Remote deploy Service registry lifecycle SOAP Service monitor Status report Res SLA mng Inter-domain id map Info mngmnt Info matadata search tech Info class SEUGrid:  SEUGrid 9 persons were sent to Switzerland, USA, Italy to work together with foreign experts and to design the AMS-SOC data processing System for 2 years in CERN. We attended AMS TIM in Switzerland and USA 19 times. Professor Samuel C. C. Ting and 9 foreign experts related to AMS experiment went to SEU and discussed the requirements and system design of AMS-SOC at SEU. Based on above work, we developed a grid platform called SEUGrid for AMS mass data processing and analysis. SEUGrid:  SEUGrid SEU Gridport Portal of the grid platform for mass data processing and analysis. It is in charge of task submission, query of computing node’s state, recollection of computing results. SEUGrid computing nodes Receive tasks from portal and generate computing results. SEUGrid Architecture:  SEUGrid Architecture Slide31:  Authentication and login of grid user Query of computing resources status Submission, scheduling and execution of computing task, tracing of execution status, and real-time log Transmission of computing result and distributed storage and management of mass data Remote invoke of Commands SEUGrid Functions SEUGrid Function (1):  SEUGrid Function (1) Authentication and login of grid user SEUGrid provides single sign on and authentication to the remote hosts based on GSI, which is implemented by providing the MYProxy host, user name and password for the Portal. The portal may use this information to consign trust certificate in the MYProxy. Thus, when a user logs in, the portal can create an available proxy for the user to execute tasks. At the same time, the portal creates a session for the user, which keeps the user’s status until log out. SEUGrid Function (2):  SEUGrid Function (2) Query of computing resources status A user can inquire the static information of computing resources such as CPU, memory, running process, supporting software and so on, by the MDS (Monitoring and Discovery Service) based on LDAP and information provider’s service. Besides, the user can also dynamically inquire the available service status at each computing node by using Ping grid command. SEUGrid Function (3):  SEUGrid Function (3) Submission, scheduling and execution of computing task, tracing of execution status, and real-time log A user can submit MC simulation arguments and start the computing tasks by using MC simulation Portlet. The scheduling module at the back-end will allocate tasks to the suitable computing node, and then the task will start being executed in the GT environment of that remote node, and the real-time log will be returned as a stream. The portal host will keep connections with the computing nodes. Therefore, the user can trace executing status, inquire current executing results and manage related task logs. SEUGrid Function (4 & 5):  SEUGrid Function (4 & 5) Transmission of computing result and distributed storage and management of mass data SEUGrid can automatically recollect the task computing results, or users manage the result files by themselves, both based on GridFTP service at each computing node. Remote invoke of Commands SEUGrid supports remote invoke of commands. MC Production:  MC Production Features of MC Simulation Computation:  Features of MC Simulation Computation Huge Computation 1,000,000 event simulation,1000 hours(Intel petium4 single processor ) Good parallel rough granularity, easy to divide Large Scale Data Totally 206T AMS MC Software and Configuration :  AMS MC Software and Configuration ams02mcdb ams02mcdb.addon bbftp files CRC Linux execs AMS mysql book-keeping execs & docs AMS MC Simulation Flow:  AMS MC Simulation Flow SEUGrid Portal:  SEUGrid Portal Task Submission:  Task Submission Task Monitoring:  Task Monitoring Result Recollection:  Result Recollection Result Retrieving:  Result Retrieving Final Results:  Final Results Slide46:  Current MC Simulation Results Outline:  Outline Background of AMS experiment Mass data processing grid AMS data processing grid platform Related research work on SEUGrid Future work in AMS data processing Related Research Work on Grid:  Trust-based and QoS-measured scheduling algorithm QoS-based grid resource management Dividing Grid Service Discovery into 2-Stage Matchmaking Predict-based and cost-based replica replacement algorithm Semantic access control in grid computing Grid security policy implementation model and dynamic authorization with feedback Related Research Work on Grid Trust-Based and QoS-Measured Scheduling:  Current scheduling algorithms, such as 2-Phase Scheduling Strategy, Co-RSPB, Co-RSBF, Co-RSBFR Algorithms based on priority and Best Fit Mechanism, are lack of considering together with QoS requirements, the scheduling efficiency and the dynamic characteristics of VO or networks. By trust degree, trust Model optimizes many factors, such as task quantity, task arriving rate, length of waiting queue, diversity of network structure and robustness of computing nodes, etc. Deriving from the trust model, this scheduling strategy always selects nodes whose trust degree is high to get the best stable and high efficient computing nodes. This Scheduling decreases actual response time of task and improves the dynamic efficiency of the task computing. Trust-Based and QoS-Measured Scheduling Slide50:  Definitions of Direct Trust, Reputation and Trust from source node i to destination node j Definition of Trust Comparison: TB&QMS and NSA:  Comparison between TB&QMS and NSA Comparison: TB&QMS and NSA QoS-based Grid Resource Management:  Application/Grid Service layer Providing descriptive QoS parameters, such as Security QoS, Reliability QoS and Accounting QoS, and, meeting the needs of end users’ simple QoS requests. Virtual Organization (VO) layer Mapping user’s QoS requirements to Grid QoS, and integrating similar QoS parameters of Physical layer as one group according to their properties. Physical layer Mapping QoS of VO layer to Physical layer. QoS-based Grid Resource Management Hierarchical Structure of Grid QoS:  Hierarchical Structure of Grid QoS Structure of GRAM based on QoS :  Structure of GRAM based on QoS Features of GRAM-QoS Model:  Features of GRAM-QoS Model Guarantee QoS requirements of users By mapping, converting and negotiating the QoS parameters, GRAM-QoS can set the user's requirement about QoS in the process of resource allocation management. By QoS admission control, GRAM-QoS can avoid invalid resource allocation and balance the resources’ workload. So GRAM-QoS not only can fulfill the user's QoS requirement but also can enhance the efficiency of resource allocation and system performance. Possess Excellent Scalability and Compatibility Scalability: All modules of GRAM-QoS can be customized by grid developers. Compatibility: This model can be compatible of other models. Model of Service Discovery:  Model of Service Discovery Dividing grid service discovery into 2-stage matchmaking. The service matching process is divided into 2 stages: service type matching and instance matching, and a Grid Service Discovery Model Based on 2-Stage Matching is proposed. In the model, VO is regarded as the managerial unit for grid services and a two-level publication architecture is adopted. The simulation results show that the model can effectively aggregate the service information and avoid the workload caused by frequent dynamic updating. Process of Service Discovery:  Process of Service Discovery Data Replica Management:  Data Replica Management It predicts the hot spot replica in time window and only keeps part of copies and not only improves the speed of accessing but also saves the storage space. We focus on the cost of copies replacement, such as network delay, bandwidth, copy size and system reliability. In SEUGrid, the data management service mainly adopts the predict-based and cost-based replica replacement algorithm -- PC-based algorithm. Simulation Results (1):  Simulation Results (1) The simulation results of LRU、LFU and PC-based algorithms in mean job time Simulation Results (2):  Simulation Results (2) The simulation results of LRU、LFU and PC-based algorithms in bandwidth consumption Semantic Access Control:  Semantic Access Control Features of Semantic Access Control :  Features of Semantic Access Control Ontology has ability to specify the heterogeneous, distributed and semi-structure information well, which is fit for the high distribution and dynamic of the Grid. It also can be expressed by semantics and is good for the security information exchange among heterogeneous systems. The security policies and security attributes of resources and entities can been clearly expressed by the lexical description based on the ontology, which is good for the security information exchange among heterogeneous systems. Based on the semantic description, the logic layer supplies the related rules in the semantic reasoning. The access control decisions are made according to the results of semantic reasoning. That can make the grid access control mechanism more intelligent and dynamic, and can implement the access control of fine granularity. Grid Security Policy Implementation Model:  Grid Security Policy Implementation Model The resource sharing in the Grid can be controlled, not implemented at will. The security policy implementation model can be established through the negotiation between the resource provider and resource user. The security policy can be implemented in the model. Because of the dynamic of the grid, the security mechanism used in the grid is not static . It should be convenience to modify and configure. We put up a security policy implemented model SPIM. The functional entities in the model and the communication processes are established. The warrants used in the communication are also stipulated. The global security policy of VO and the local security policy of the resource’s administrative domain can been made, modified and implemented separately in the model. And consistent implementation of policies at all levels can be guaranteed. Dynamic Authorization with Feedback:  Dynamic Authorization with Feedback A dynamic authorization mechanism for gird is proposed in the SPIM. Through the negotiation between users and resources, the Grid security management entities bind their requirements together and form bindings according to security policies, make the authorization decision. Outline:  Outline Background of AMS experiment Mass data processing grid AMS data processing grid platform Related research work on SEUGrid Future work in AMS data processing Future SEUGrid Structure:  Future SEUGrid Structure Data Management of SEUGrid:  Data Management of SEUGrid Deep Processing of AMS Data (1):  Deep Processing of AMS Data (1) Scheduling strategy based on workload balance The raw data transmitted from GSC at a rate about 3 to 4 Mbits/s should be stored in the databases .In addition, the reconstructed and tagged data from the raw data above — which is about 44T and 0.6T every year separately, and the synchronous MC data (about 44T every year) also should be stored. A storage node would be chosen by a reasonable schedule algorithm based on factors of storage node capacity, network bandwidth and latency, and so on. Mass data index and compression 80% of the raw data and 20% of the ESD data should be kept on disk for direct access. A research on mass data index and compression is needed to support real-time access. Deep Processing of AMS Data (2):  Deep Processing of AMS Data (2) Quick data classification and categorization All the data should be categorized to support non-direct access. A research on mass data classification and categorization is needed. Task decomposition, schedule and collaboration MC computing, which is large in scale, is a computing of big granularity. As a result of task decomposition and executing sub-task in the grid environment, much time would be saved. 10% of the raw data should be used to reconstruct the events in half an hour. A research on task schedule and collaboration in real-time circumstances is needed. Deep Processing of AMS Data (3):  Deep Processing of AMS Data (3) Fault tolerance Eliminate the failure of application computing which is caused by the loss of network messages, the fault of software execution and tampered messages. The data visualization and virtualization and data mining Researches on techniques, such as the distributed visualization and collaborative visualization of scientific computation are needed to support visualization analysis of AMS-02 data. A research on mining technology of mass data is needed to get undiscovered phenomena and laws from AMS-02 mass data. Deep Processing of AMS Data (4):  Deep Processing of AMS Data (4) Monitoring transmitting result of AMS experiment Data storage , physical analysis and software development of AMS experiment can de done in the different type of computers. To manage and use these computers more efficiently, developing a network performance monitoring system is necessary. Security and access control AMS experiment involves over 300 scientists from 15 countries and institutions. Because of high degree distribution and the large scale of AMS data, a series of advanced security monitoring, defense and control technologies are needed. Combining with new semantic web technology, a research on semantic access control is needed to offer grid security technologies, which fit for forming dynamic, heterogeneous, distributed SEUGrid environment. Slide72:  Thanks! Q&A!

Add a comment

Related presentations

Related pages

Guest Editorial Forward to the Special Issue on Systems ...

Guest Editorial Forward to the Special Issue on Systems Integration and Collaboration in Design, Manufacturing, and Services
Read more

Muiltiobjective Optimization Using Nondominated Sorting in ...

Neural Processing Letters 44:2519 ... A multi-tier adaptive grid algorithm for the evolutionary multi-objective optimisation of ... Jun Shen, Junzhou Luo.
Read more

... John L.; Curtis, Carey, Prof Dr; Curtis, Carey; Hesse ... Data and Patterns Pandian, C ... Introduction to Quantum Physics and Information Processing ...
Read more

How to Sharing Information About Books Collection in the ...

buku 108b The Blackwell ... Junzhou Luo, Zongkai Lin, Jean-Paul A. Barthès, ... Data Management in Grid and Peer-to-Peer Systems, 1 conf., ...
Read more

In trying to solve multiobjective optimization problems, many traditional methods scalarize the objective vector into a single objective. In those cases ...
Read more

Fixed notation errors involving the mass of the rigid ... Title: Completely symmetric configurations for sigma-games on grid ... Luo >fluo (at)math.rutgers ...
Read more

image processing approach: Topics by

The method uses image analysis with a data processing technique and takes into account the ... is why Prof. Til Aach and the Institute of Image ...
Read more