How cunoFS solves your object storage problems

Technology in Detail

cunoFS is a high-throughput, POSIX-compatible filesystem for object storage. Files and directories on object storage accessed via cunoFS behave the same as they do on a local disk.

cunoFS is compatible with cloud object stores such as AWS S3 and Azure Blob Storage, as well as on-premises object stores such as minIO, Dell ECS, and NetApp StorageGRID.

Table of Contents

cunoFS makes object storage as fast as a filesystem
cunoFS does this without introducing bottlenecks or scrambling data
How existing approaches to POSIX compatibility for object storage fall short
How cunoFS removes the limitation of existing solutions while staying compatible with legacy scripts, workflows, and apps
cunoFS Fusion: a solution for the highest-performance use cases
cunoFS demonstrated: object storage as a filesystem in action

cunoFS makes object storage as fast as a filesystem

Object storage has quite different characteristics from normal file storage. These characteristics allow object storage to scale far beyond regular filesystems, delivering incredible throughput, and utilizing erasure coding for lower costs. However, object storage is generally not POSIX compatible and has historically been seen as an archival tier rather than as a high-throughput performance tier.
File storage Object storage
Directory structure Flat, no directory structure
Scalability is hard Highly scalable
POSIX compatible: UID/GID, symlinks, hardlinks, ACL, … Not POSIX compatible
Direct access through OS (syscalls) REST API (HTTPS) inside application
Strong consistency guarantees Varies but usually weaker consistency
Usually single site Often geo-distributed
RAID, duplication, or other redundancy Erasure coded (highly efficient)
Typically low(er) latency Typically high(er) latency
Fast random-access write Replace whole object at a time
This table highlights some of the key differences between file storage and object storage.
A common approach to presenting object storage for a POSIX filesystem used in other commercial and open-source solutions is to try and treat object storage more like block storage. Chunks of file data (often deduplicated and compressed) are stored, and a separate database is used to maintain POSIX metadata and information on how the chunks can be reconstituted back into files.
A competing approach to using object storage for a POSIX filesystem scrambles files and can result in scalability and performance issues.
A competing approach to using object storage for a POSIX filesystem scrambles files and can result in scalability and performance issues.
The issue with this approach is that, as the files are necessarily scrambled on object storage, they are no longer directly accessible. These solutions usually provide a separate S3 gateway (and metadata server) to unscramble the files back from the chunks stored on the object storage. Typically, an NFS gateway is deployed for access by POSIX workloads. The reconstitution back into files can be a huge performance bottleneck, and when multiple nodes are trying to access the gateways, severe scalability issues can cripple performance.

cunoFS does this without introducing bottlenecks or scrambling data

cunoFS radically changes how object storage is used, turning it into a first-class direct tier for POSIX file access, where both POSIX workloads and object-native workloads can directly access object storage (such as S3). cunoFS does this without introducing any gateways and without scrambling the data — each file is directly stored as an object and each object is directly accessible as a file.
cunoFS enables direct, scalable access from both POSIX and object-native workloads
cunoFS enables direct, scalable access from both POSIX and object-native workloads
This means workloads can scale across nodes to as much as the object storage itself can handle (over 10+ Tbps in the case of AWS S3). cunoFS takes care of the POSIX semantics including consistency guarantees (for POSIX workloads only), symlinks/hardlinks, POSIX permissions, ACLs, mode, timestamps, etc. This POSIX metadata is stored directly on the object storage alongside the data – unlike other approaches, there is no separate database required.

How existing approaches to POSIX compatibility for object storage fall short

Using object storage as a second-class storage tier limits performance and scalability

There are several commercial POSIX filesystems that use object storage, hiding it behind regular file storage as a second-class storage tier – an archival tier for old or infrequently accessed data.
Hiding object storage (like S3) behind a POSIX filesystem results in slow POSIX performance from object storage.
Hiding object storage (like S3) behind a POSIX filesystem results in slow POSIX performance from object storage.
In these solutions, when something is needed from object storage, it is reconstituted back into the primary POSIX storage layer for access. A user or an application then accesses the reconstituted data from the primary POSIX storage layer, rather than object storage.
The issue with this approach is that moving data between the primary POSIX layer and object storage is a performance and scalability bottleneck. Even though object storage is designed for high performance and infinitely scalable, these advantages are canceled out: the solutions that follow this approach only typically deliver 2-3 Gbps throughput when accessing files stored in AWS S3 through POSIX.

FUSE-based filesystems for object storage access create too much system overhead

Filesystem operations on a FUSE filesystem require many expensive context switches to complete.
Filesystem operations on a FUSE filesystem require many expensive context switches to complete.
On Linux, FUSE enables a user-space application to present a virtual filesystem at a mount point. Solutions like s3fs and goofys use FUSE as a filesystem layer for object storage. However, FUSE introduces significant overheads. Each time an application does an operation on the virtual filesystem, there are typically at least four context switches involved between user-space and the kernel to service the request.
This additional overhead makes FUSE-backed object storage filesystem performance very slow, to the point where some applications will not function properly with it.

SHIMs are unable to intercept raw syscalls, thus limiting compatibility

SHIMs are unable to intercept raw syscalls from static and semi-static binaries.
SHIMs are unable to intercept raw syscalls from static and semi-static binaries.
Another approach to handling virtual filesystems is to use a SHIM (such as LD_PRELOAD on Linux). The SHIM library can intercept library calls made by an application to the virtual filesystem and handle them without necessarily going to the kernel. Applications that depend on libraries in this way are called dynamic binaries. By eliminating expensive context switches, the SHIM approach can significantly improve performance, but unfortunately the SHIM cannot handle static binaries or semi-static binaries (such as some Go binaries) that directly talk to the kernel via syscalls rather than through a library. This means that it only works with some binaries.

How cunoFS removes the limitations of existing solutions while staying compatible with legacy scripts, workflows, and apps

cunoFS Direct Interception mode combines a SHIM with dynamic binary instrumentation to intercept dynamic, static, and semi-static binaries.
cunoFS Direct Interception mode combines a SHIM with dynamic binary instrumentation to intercept dynamic, static, and semi-static binaries.
cunoFS combines the SHIM approach with ultra-fast dynamic binary instrumentation (DBI). This allows cunoFS to intercept both library calls and syscalls, covering dynamic, static, and semi-static binaries. By intercepting filesystem access directly, this approach lets cunoFS support URI-based access in addition to regular path-based access — even for applications that were never designed to be run on object storage.
cunoFS also supports running as a FUSE mount for edge cases that cannot function without a FUSE mount path, but this can affect performance: it may result in half the performance compared to running under cunoFS Direct Interception mode, though our FUSE implementation is still highly performant compared with alternative tools.
cunoFS functions entirely on the user system and interfaces directly with object storage. There is no need for any additional infrastructure.

cunoFS FlexMount: a faster version of a FUSE mount

There are some cases (such as snap, flatpak, and appimage applications) which cunoFS Direct Interception mode doesn’t currently intercept. We are improving cunoFS Direct Interception mode to cover these cases as well. In the meantime, cunoFS FlexMount mode combines a FUSE mount with cunoFS Direct Interception mode, so that where possible the Direct Interception mode is used for performance, and otherwise applications can fall back to using FUSE mode for compatibility.

Performant POSIX metadata encoding

Another key problem cunoFS solves is how it handles encoding of POSIX metadata. Some other approaches, like s3fs, store this metadata within each object. This means that modifying metadata or querying a directory of metadata become very expensive operations: retrieving a POSIX directory listing of a thousand files, for example, would require a thousand API calls to individually check the POSIX metadata of each object. Competing approaches use a separate database server that stores the metadata elsewhere. This results in the POSIX data being split between the object store and the database, introducing its own set of problems.
With cunoFS (except in Fusion mode; see below), the metadata is encoded directly on object storage within a special metadata folder. POSIX metadata within each directory tends to be highly compressible, since files and directories tend to cluster around similar names, modes, permissions and even timestamps. cunoFS compresses this metadata so that it can be encoded in the filename of hidden objects.
This means, for example, that a directory listing of a thousand files could retrieve the metadata (encoded in filenames) at the same time as retrieving the actual filenames in the directory.

Predictive prefetching based on expected storage access patterns

As a result of running on the client node, and even intercepting the application that is being run, cunoFS has much deeper insight into application behavior than ordinary shared filesystems. This enables cunoFS to peek into an application and intelligently prefetch according to each application’s predicted usage. Object storage typically has higher latencies than ordinary filesystems, so high-quality predictions can be essential to delivering high performance.

cunoFS Fusion: a solution for the highest-performance use cases

In addition to regular cunoFS operation, cunoFS Fusion is available on Professional and Enterprise licenses.
cunoFS Fusion enables the best of both worlds, combining high-IOPS file storage with high throughput and lower-cost object storage.
cunoFS Fusion enables the best of both worlds, combining high-IOPS file storage with high throughput and lower-cost object storage.

cunoFS Fusion allows customers to pair an existing filesystem capable of high IOPS with the high throughput and low cost of object storage, with both treated as first-class storage tiers.

This approach combines the best of both worlds – enabling both big performance gains and significant cost savings. For example, some throughput-limited workloads like genomics or machine learning can see large performance gains from being moved to a first-class object storage tier. It can also be paired with a filesystem like AWS FSx Lustre or EFS to deliver higher combined performance and reduce costs for general workloads.

How cunoFS keeps you in control of your data and workflows

We don’t compress, scramble, or deduplicate data

We do not compress or deduplicate data – this is an explicit cunoFS design goal.
cunoFS focuses on getting POSIX on object storage working at the highest possible performance, so scrambling data (which is a requirement of competing filesystem solutions for compression/deduplication) or requiring an S3-to-S3 gateway to translate between scrambled and unscrambled formats is unacceptable. Many organizations we work with had already tried such solutions in the past and reported unacceptably poor performance and scalability. Furthermore, the bulk of customer data footprint in these organizations tends to be already compressed, and thus difficult (and unnecessary) to deduplicate – for example, image, video, or omics datasets.

cunoFS works with all of your existing apps, including your shell

The cuno command enables cunoFS Direct Interception mode to intercept the user’s preferred shell, whether that is bash, zsh, csh, or something else. Tab completion, preferences/configuration, and other actions are provided by the existing shell, so everything works the same way it does without cunoFS. Applications run the same as well: running a command like ls actually launches the unmodified ls binary installed on the system.

cunoFS works with software you’ve already written in C/C++, Java, Python, Go, Rust, or any other programming language

cunoFS is designed to work with any software, including software that you’ve written yourself.

We support the OS of your choice

We support Linux, Windows via WSL 2, and Mac via Docker. We are working on native Windows and Mac clients.
Visit the cunoFS Download page for direct downloads and installation instructions for each platform.

cunoFS works inside Docker, Kubernetes, and Singularity containers

cunoFS can be used with containers in multiple ways:
  • by mounting on the host and bind-mounting into the container
  • by injecting the cunoFS library into an unmodified container (we provide a seamless way to do this)
  • or by installing cunoFS inside the container.
For instructions specific to Docker and Singularity please see the cunoFS user guide. Our CSI driver for Kubernetes is now available on request. Note that cunoFS does not require any special privileges to run inside a container.

cunoFS supports multi-node POSIX consistency and locking semantics

cunoFS Fusion (requires a Pro or Enterprise license) and cunoFS Enterprise Edition support multi-node POSIX consistency and locking semantics.

cunoFS demonstrated: object storage as a filesystem in action

Have questions about cunoFS?

Book a demo to have any technical and commercial questions answered.

Ready to get started with high performance, POSIX object storage?

Start using cunoFS today