Use Case Overview
NICT Science Cloud has integrated Gfarm/Pwrake's large-scale data processing capabilities with HpFP's innovative high-speed data transfer technology. This makes it possible not only to manage massive scientific data as "archived data" but also to immediately analyze & visualize it as "real-time data," dramatically improving the speed of scientific research & its applicability to society.
This case study introduces an initiative that achieved real-time data processing & visualization at the National Institute of Information & Communications Technology (NICT) Science Cloud by leveraging Gfarm/Pwrake & the independently developed high-speed data communication protocol (HpFP).
Challenge
Bottlenecks in Big Data "Immediacy" & "Visualization"
Ultra-large-scale data (various sensing data, simulation data, video data, etc.) is collected daily across diverse scientific fields such as space science, earth environment, & human sciences.
Conventional data processing faced the challenge of enormous time required for visualization processing of large amounts of data. Particularly in on-demand systems where processing is performed after users request data, data processing time & transmission time of visualization results became bottlenecks, requiring tens of seconds to obtain results. Most of this communication time was occupied by image file transmission. Another major challenge was that when ensuring traceability (tracking capability) of important scientific data, conventional time authentication systems could not handle high-speed, large-capacity cloud databases, resulting in limited processing capacity.
Solution
60x Speed Improvement, Instant Data Display, & Complete Traceability
NICT Science Cloud enabled immediate data utilization by centering on the distributed file system Gfarm & parallel workflow system Pwrake, combined with proprietary technology development.
Achieving High-Speed Real-Time Processing
1Dramatic Speed Improvement Through Parallel Processing
Pwrake significantly improves processing speed by performing efficient scheduling that prioritizes file locality for data distributed across Gfarm. This shortened processing that conventionally took 20 minutes to 20 seconds in real-time 3D data processing of phased array radar, achieving a 60x speedup. 3D visualization became possible within 70 seconds after observation.
2Fast Visualization & Viewing Experience
The independently developed multidimensional multi-layered image database reduced big data display time from the conventional tens of seconds to approximately 1 second. This achieved a viewing environment that responds without stress to operations such as position movement (swipe) & zoom in/out (double-click/pinch), much like Google Maps.
High-Speed Remote Access & Data Assurance
3Proprietary High-Speed Transmission Protocol (HpFP)
To improve image transmission that was a communication bottleneck, the proprietary protocol HpFP, which is robust against packet loss & delay, was developed & implemented. This enabled high-speed data writing & reading to cloud storage even from remote locations, achieving faster remote storage than conventional protocols (UDT) (maximum 12.8Gbps / 6 parallel).
4Ensuring Integrity & Authenticity
By linking the distributed storage system with timestamp services, data traceability was established. A system was realized that proves important data has not been tampered with (integrity) & that the creation time is correct (authenticity).
Future Development
We aim to utilize the achievements of Science Cloud to solve societal challenges. For example, the Himawari-8 Real-time Web has begun multifaceted utilization in disaster prevention, education, & news reporting, & we are further promoting internationalization (multilingual support) in the Asia-Oceania region. In the future, we aim to pioneer new analytical methods applying secure web technologies, such as technology development that overlays changes in "social" & "individual" consciousness, similar to the utilization of newspaper article databases.
By incorporating the high-speed transmission protocol HpFP into web browsers, we plan to achieve faster image transmission even in network environments with high latency & many packet losses. The multidimensional multi-layered image database for time-series data is planned to expand from the current 1 billion images to 100 billion images in five years. We are also advancing data utilization in fields directly connected to social infrastructure, such as solar power generation prediction systems.
Other Use Cases
- HPCI Shared Storage
- The lifeline of research infrastructure! A highly reliable data sharing platform achieved through geographic distribution & redundancy
- JLDG (Japan Lattice Data Grid)
- Supporting the forefront of physics! An international data grid realized with Gfarm
- Subaru Telescope Data Analysis
- Initiatives that significantly improved processing speed by leveraging Gfarm & Pwrake