NPO Tsukuba OSS Technical Support Center (OSS Tsukuba)

Overview

Gfarm (pronounced "G-farm") is a wide-area distributed file system that has been under continuous research & development since around 2000. It bundles together local storage from multiple PCs & multiple PC clusters distributed across wide areas to function as a large-scale, high-performance shared file system.

Users can access data through a single virtual directory hierarchy without being aware of where the data is actually stored. This mechanism meets the needs of fields such as physics, astronomy, & life sciences to efficiently process & share extremely large datasets ranging from terabytes to petabytes.

Gfarm ensures transparency & security through ongoing research & open-source code availability. The NPO Tsukuba OSS Technical Support Center promotes technical support, advanced maintenance & construction support, & sharing of technical information for this Japanese-originated OSS centered on the Gfarm File System.

Gfarm System Configuration (3 Types of Hosts)

The Gfarm File System is primarily composed of the following three types of hosts (computers), though a single host can serve multiple roles if the number of available hosts is limited.

Client: Hosts that use the Gfarm File System.
File system node: A group of hosts that provide actual data storage space for the Gfarm File System. The system is designed to support configurations with thousands or tens of thousands of nodes distributed across wide areas, with an I/O daemon called gfsd running on each node. These nodes typically also serve as clients using Gfarm.
Metadata server: A host that manages metadata for the Gfarm File System (information such as which files are stored where). It runs a metadata server daemon called gfmd & a backend database such as PostgreSQL.

Features & Benefits

The Gfarm File System is like "a massive, fault-tolerant data repository accessible from anywhere." It incorporates unique innovations to meet the needs of diverse targets, including researchers handling large-scale data & IT administrators who prioritize stable system operation.

1Peace of Mind That Data Won't Be Lost

Important data is replicated in multiple locations, making it resilient to disasters.

Data is not stored in just one location but is replicated in multiple remote locations. For example, in the "HPCI Shared Storage" for supercomputers, 100 petabytes (an extremely large amount of data) of files are replicated and stored in two locations: Kashiwa in Chiba Prefecture and Kobe in Hyogo Prefecture. This ensures that even if a problem occurs at one site, data can continue to be accessed from the other, preventing system downtime.

2Unlimited Expansion Potential

The system can be expanded without downtime even as data volume increases.

Storage can be increased or decreased without stopping the system. This allows data storage capacity and processing power to be expanded or reduced as needed. Additionally, the number of replicas and replica placement locations can be freely configured according to data importance and access frequency, making it possible to avoid concentrated data access.

3Secure Access from Anywhere

Data can be used from anywhere in the world as if it were on your own computer.

Unlike ordinary file systems, Gfarm's greatest feature is that it uses the Internet to enable secure file access even from remote locations. Like cloud storage, you can access your files from anywhere.

4Guarantee That Data Won't Be Corrupted

Invisible data corruption (silent data corruption) can be automatically detected and prevented.

This is a function that guarantees data integrity (that data is not corrupted). This enables the discovery and prevention of problems called "silent data corruption," where data becomes corrupted without any error messages. It prevents users from unintentionally accessing corrupted data, and the system automatically responds to such malfunctions.

Differences from Other Solutions: As a domestically developed file system with open-source code, it provides peace of mind. With over 10 years of operational track record in HPCI Shared Storage and JLDG, and with necessary features being continuously developed, it can accommodate a wide range of requirements. It addresses silent data corruption and has automatically detected and repaired numerous instances of corrupted data to date.

Use Cases

Gfarm is used for data utilization on the supercomputer "Fugaku" & in cutting-edge research fields (particle physics, astronomy, etc.).

HPCI Shared Storage: The lifeline of research infrastructure! A highly reliable data sharing platform achieved through geographic distribution & redundancy

JLDG (Japan Lattice Data Grid): Supporting the forefront of physics! An international data grid realized with Gfarm

Subaru Telescope Data Analysis: Initiatives that significantly improved processing speed by leveraging Gfarm & Pwrake

NICT Science Cloud: Enables real-time processing, high-speed data visualization, & instant analysis of big data

Technology & Development

Gfarm enhances convenience through secure OAuth/OIDC authentication, the "Gfptar" small file batch transfer function, & Nextcloud support.

OAuth/OIDC Authentication: This mechanism enables more secure access to the Gfarm file system using a new login method.

Gfptar: Parallel transfers large numbers of input entries (files) while automatically consolidating them into multiple archive files within the output directory.

Nextcloud Support: By accessing shared storage directly, users can more easily utilize data on Gfarm.

Gfarm HTTP Gateway: A mechanism enabling access through the HTTP protocol, widely used in web applications.

Installation & Configuration

To deploy Gfarm & build a large-scale research data infrastructure, configuration of metadata servers, file system nodes, & other components is necessary. The Gfarm File System consists of host groups including clients, file system nodes, & metadata servers. Details are provided in the "Installation Manual" & "Setup Manual". These manuals are available in GitHub repositories & other resources of the NPO Tsukuba OSS Technical Support Center's related communities.

Basic Configuration Flow

Installation & configuration work is performed using Gfarm management commands & configuration files.

1. Initial Metadata Server Configuration

Decide on an administrator username & configure it using the config-gfarm command (with root privileges).
This configuration sets up & starts the backend database (e.g., PostgreSQL), creates configuration files (/etc/gfarm2.conf, /etc/gfmd.conf), & starts the metadata server gfmd.

2. Authentication Method Configuration: For example, when adopting shared key authentication, create a _gfarmfs user for authentication with file system nodes & generate an authentication key (e.g., using the gfkey -f command).

3. Automatic Startup Configuration: Configure the metadata server (gfmd) & backend database (e.g., gfarm-pgsql) to start automatically.

4. Client Configuration & Operation Verification

Install the gfarm-client package that contains Gfarm management commands.
Use management commands such as gfls (display directory contents), gfuser (user management), & gfgroup (group management) to verify configuration & operation.

Source Code

Various source codes can be accessed below. Source code links lead to "GitHub".

Gfarm File System: Gfarm File System server & client

Gfarm2fs: Program for mounting the Gfarm File System

Nextcloud-Gfarm: Nextcloud container supporting Gfarm external storage

Gfarm-hadoop: Plugin for using Gfarm with Hadoop

Gfarm-samba: Samba plugin for using Gfarm from Windows clients

Gfarm-mpiio: Plugin for MPICH & MVAPICH to use Gfarm with MPI-IO

Gfarm-gridftp-dsi: Plugin for GridFTP server

Gfarm-zabbix: Zabbix plugin for Gfarm fault monitoring

Documentation

Basic Information

Technical Information

Support

Established to provide technical support for Japanese-originated open source software (OSS) centered on the Gfarm File System, we offer advanced maintenance & construction support & share technical information. For details, please see "Membership & Support".

Frequently Asked Questions

General

When is Gfarm useful?: It allows you to safely store important data. It meets various requirements such as sharing large amounts of data among multiple users.

Can individuals use it?: It can be used by individuals, or shared by small or large groups. It is also possible to publish data globally.

Is security adequate?: Data is encrypted during network transfer, so the contents of the data cannot be seen during transmission.

How much can data capacity be increased?: When capacity runs out, it can be increased by adding storage servers.

Will data become corrupted?: When storing large amounts of data, data can sometimes become corrupted even though no errors have occurred. This is called silent data corruption & is a major problem. In Gfarm, stored data is checked for corruption when being read, & if corrupted, uncorrupted data is referenced, so even if silent data corruption occurs, corrupted data will not be read.

Is support available?: The NPO Tsukuba OSS Technical Support Center provides deployment support, answers to questions, & troubleshooting support.

What does open source mean?: It means that all source code is publicly available. You can investigate what processing is being performed & can also add necessary features.

What is Gfarm's license?: It is a Modified BSD License. While requirements such as displaying copyright & license notices are required, it is a license that allows the use, modification, & redistribution of the software essentially freely.

What is Gfarm?: It is software for safely sharing data in wide-area environments. For details, please see Gfarm File System below.

Where can I download Gfarm?

It is distributed from Gfarm File System. The latest source code can be obtained with the following commands:

Stable version: $ git clone https://github.com/oss-tsukuba/gfarm.git
Gfarm2fs: $ git clone https://github.com/oss-tsukuba/gfarm2fs.git

Is there a mailing list?: You can learn about release information and commit logs by subscribing to the mailing list.

Technical Information

How much memory does gfmd consume?: Approximately 500-1,000 bytes of memory capacity per entry (file or directory) is required. If the metadata server has 256GB of memory, it can manage 250-500 million entries. To manage more entries, you can either increase the metadata server's memory capacity or distribute across multiple Gfarm file systems. Even when distributed across multiple Gfarm file systems, users can seamlessly access different Gfarm systems through symbolic links.

How much disk space does the machine running gfmd consume?: Several times the required memory amount of gfmd.

How many descriptors should gfmd use?

This value is specified by metadb_server_max_descriptors in gfmd.conf. The calculation method for setting this value is as follows:

Nm: Maximum expected number of metadata servers
Nf: Maximum expected number of file system nodes
Nc: Maximum expected number of concurrent client processes

It must be set to at least the following value:
(Nm - 1) * 2 + Nf * (Nf + 1) + Nc * (Nf + 1) + small number
However, it may not be possible to increase due to OS limitations. In that case, increase the OS limit value.

Security

Can Gfarm be operated safely in an environment not protected by a firewall?: Gfarm supports the authentication methods sharedsecret, gsi_auth, gsi, sasl_auth, sasl, tls_sharedsecret, tls_client_certificate, kerberos_auth, & kerberos, but in Internet environments, sharedsecret, gsi_auth, & kerberos_auth cannot be considered safe. We recommend using tls_sharedsecret, tls_client_certificate, sasl, gsi, or kerberos authentication. Please refer to the next item for more information.

What are the differences between authentication methods?

"sharedsecret" authentication uses shared key authentication. The shared key must be placed in the home directory of users on all hosts. During authentication, the shared key is not transmitted, but all communication is in plaintext.
"gsi_auth" authentication uses X.509 certificates for authentication processing and communicates in plaintext after authentication. "gsi" authentication is an authentication method using X.509 certificates. Communication is encrypted with GSI (Grid Security Infrastructure).
"tls_sharedsecret" authentication uses shared key authentication but utilizes TLS-encrypted communication channels. "tls_client_certificate" authentication uses X.509 certificate-based authentication with TLS-encrypted communication channels. GSI proxy certificates can also be used.
"sasl_auth" authentication uses SASL authentication. TLS-encrypted communication channels are used during authentication, but communication is in plaintext after authentication. "sasl" authentication uses SASL authentication with TLS-encrypted communication channels.
"kerberos_auth" authentication uses Kerberos authentication. Communication is in plaintext after authentication. "kerberos" authentication uses Kerberos authentication with data encryption even after authentication.
With "sharedsecret" and "tls_sharedsecret" methods using shared keys, user home directories must be created on all hosts and shared keys must be placed, but with other methods, creating user home directories is not necessary.

Troubleshooting

Cannot connect to file system nodes or metadata server.: In the default configuration, file system nodes use 600/tcp & 600/udp, & metadata server nodes use 601/tcp. In particular, file system nodes displayed as "x.xx/x.xx/x.xx" with "gfhost -lv" have failed to connect via 600/udp. Default port numbers can be changed with options of config-gfarm & config-gfsd. Please refer to the installation guide for details. Check whether the result of reverse DNS lookup matches the hostname in the server certificate & the name in the auth line of gfarm2.conf. This problem can occur when hostnames without domain name suffixes are registered before hostnames with suffixes in /etc/hosts.

Authentication errors occur during file access or file replica creation, or "no filesystem node" errors occur.: Authentication settings for file system nodes may not be configured correctly. If you add -dv to the gfsd command line options when starting, gfsd will start in the foreground & output detailed messages. Please resolve the cause of the error according to the messages. In the case of shared key authentication, check whether the same shared key file (~/.gfarm_shared_key) is placed in the home directory of each file system node.

A file system node's disk has crashed. What should I do?: If the file replica count is set to 2 or more with the gfncopy command, there is no problem as replicas exist on other file system nodes. If an application happens to be reading a file from the crashed node, it will automatically switch to reading the replica from another file system node after network_receive_timeout seconds, & no error will be reported to the application. The default network_receive_timeout is 60 seconds. However, if an application was writing to a file on the crashed node, a write error will be returned to that application & the update will be lost. After a crash, replicas will automatically be created to match the specified replica count after replica_check_host_down_thresh seconds. The default is 3 hours. If the file replica count is not set to 2 or more, replicas that exist only on the crashed file system node will become inaccessible. However, truncate is possible. When the file system node is restored, consistency checks between metadata & the spool directory are performed at startup, & if inconsistent, files are moved to the /lost+found directory.

File modification times seem incorrect.: The file modification time is the modification time of the file system node that actually created the file. If the time on the file system node is not set correctly, the modification time of files written to that file system node will be incorrect. Please set the time on the file system node correctly using NTP or similar.

How can I collect core files when gfmd or gfsd terminates abnormally?: Configuration examples are shown in the Installation Manual manual, please refer to it.

I have InfiniBand but RDMA doesn't work.

Did you specify --with-infiniband for both client & gfsd during configure? For example, is the library displayed with the following command? % ldd /usr/local/lib/libgfarm.so |grep libibverbs.so
Is the error message "insufficient memlock size(...), please expand memlock size" appearing? If so, expand the resource limit. Try editing /etc/security/limits.conf & specifying "* hard memlock unlimited".
Is the error message "ibv_.... failed, no memory" appearing? If so, either resource limits are applied or the memory lock size is actually too large, & effective RDMA use is not possible. Check the memory lock size of the process in question. % grep VmLck /proc/<pid>/status

Gfarm File System