Writing a FUSE filesystem in Python

by Davide Mastromatteo
10 minute read

teaser

We ran into a problem last week. Our web application produces a lot of documents that have to be accessed frequently for a couple of months after they’re created. However, in less than a year these documents will be almost never accessed anymore, but we need to keep them available for the web application and for tons of other legacy apps that might need to access them.

Now, these documents take a lot of space on our expensive but super fast storage system (let’s call it primary storage system or PSS from now on) and we would like to be able to move them on the cheaper, not so good and yet quite slow storage system (that we’re going to call secondary storage system or SSS) when we believe that they will not be accessed anymore.

Our idea was to move the older files to the SSS and to modify all the software that needs to access the storage so to look at the PSS first and in the case, nothing was found, to look at the SSS. This approach, however, meant that we should have to modify all the client software we had…

“There are no problems, only opportunities” — I.R.

So, wouldn’t it be great if we could create a virtual filesystem to map both the PSS and the SSS into a single directory?

And that’s what we’re gonna do today.

From the client software perspective, everything will remain unchanged, but under the hood all our read and write operations will be forwarded to the correct storage system.

Please note: I’m not saying that this is the best solution ever for this specific problem. There are probably better solutions to address this problem but… we have to talk about Python, don’t we?

What we’ll need

To start this project we just need to satisfy a couple of prerequisites:

  • Python
  • A good OS

I assume that you already have Python (if not… what are you doing here?), and for what about the OS keep in mind that this article is based on FUSE.

According to Wikipedia, FUSE is

a software interface for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a “bridge” to the actual kernel interfaces.

FUSE is available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3, Android and macOS.

So, if you use macOS you need to download and install FUSE, if you use Linux, keep in mind that Fuse has been merged into the mainstream Linux kernel in the 2.6.14 version, originally released in 2005, on October the 27th, so every recent version of Linux has it yet.

If you use Windows… well… I mean… I’m sorry buddy, but you didn’t satisfy the second prerequisite…

The fusepy module

First of all, to communicate with the FUSE module from Python you will need to install the fusepy module. This module is just a simple interface to FUSE and MacFUSE. Nothing more than this, so go on and install it by using pip:

pip install fusepy

Let’s start

There’s a great start point for building our filesystem, and it’s the Stavros Korokithakis code. What Stavros made is available on his GitHub repo and I will report it here:

  1#!/usr/bin/env python
  2
  3from __future__ import with_statement
  4
  5import os
  6import sys
  7import errno
  8
  9from fuse import FUSE, FuseOSError, Operations
 10
 11
 12class Passthrough(Operations):
 13    def __init__(self, root):
 14        self.root = root
 15
 16    # Helpers
 17    # =======
 18
 19    def _full_path(self, partial):
 20        if partial.startswith("/"):
 21            partial = partial[1:]
 22        path = os.path.join(self.root, partial)
 23        return path
 24
 25    # Filesystem methods
 26    # ==================
 27
 28    def access(self, path, mode):
 29        full_path = self._full_path(path)
 30        if not os.access(full_path, mode):
 31            raise FuseOSError(errno.EACCES)
 32
 33    def chmod(self, path, mode):
 34        full_path = self._full_path(path)
 35        return os.chmod(full_path, mode)
 36
 37    def chown(self, path, uid, gid):
 38        full_path = self._full_path(path)
 39        return os.chown(full_path, uid, gid)
 40
 41    def getattr(self, path, fh=None):
 42        full_path = self._full_path(path)
 43        st = os.lstat(full_path)
 44        return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime',
 45                     'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid'))
 46
 47    def readdir(self, path, fh):
 48        full_path = self._full_path(path)
 49
 50        dirents = ['.', '..']
 51        if os.path.isdir(full_path):
 52            dirents.extend(os.listdir(full_path))
 53        for r in dirents:
 54            yield r
 55
 56    def readlink(self, path):
 57        pathname = os.readlink(self._full_path(path))
 58        if pathname.startswith("/"):
 59            # Path name is absolute, sanitize it.
 60            return os.path.relpath(pathname, self.root)
 61        else:
 62            return pathname
 63
 64    def mknod(self, path, mode, dev):
 65        return os.mknod(self._full_path(path), mode, dev)
 66
 67    def rmdir(self, path):
 68        full_path = self._full_path(path)
 69        return os.rmdir(full_path)
 70
 71    def mkdir(self, path, mode):
 72        return os.mkdir(self._full_path(path), mode)
 73
 74    def statfs(self, path):
 75        full_path = self._full_path(path)
 76        stv = os.statvfs(full_path)
 77        return dict((key, getattr(stv, key)) for key in ('f_bavail', 'f_bfree',
 78            'f_blocks', 'f_bsize', 'f_favail', 'f_ffree', 'f_files', 'f_flag',
 79            'f_frsize', 'f_namemax'))
 80
 81    def unlink(self, path):
 82        return os.unlink(self._full_path(path))
 83
 84    def symlink(self, name, target):
 85        return os.symlink(name, self._full_path(target))
 86
 87    def rename(self, old, new):
 88        return os.rename(self._full_path(old), self._full_path(new))
 89
 90    def link(self, target, name):
 91        return os.link(self._full_path(target), self._full_path(name))
 92
 93    def utimens(self, path, times=None):
 94        return os.utime(self._full_path(path), times)
 95
 96    # File methods
 97    # ============
 98
 99    def open(self, path, flags):
100        full_path = self._full_path(path)
101        return os.open(full_path, flags)
102
103    def create(self, path, mode, fi=None):
104        full_path = self._full_path(path)
105        return os.open(full_path, os.O_WRONLY | os.O_CREAT, mode)
106
107    def read(self, path, length, offset, fh):
108        os.lseek(fh, offset, os.SEEK_SET)
109        return os.read(fh, length)
110
111    def write(self, path, buf, offset, fh):
112        os.lseek(fh, offset, os.SEEK_SET)
113        return os.write(fh, buf)
114
115    def truncate(self, path, length, fh=None):
116        full_path = self._full_path(path)
117        with open(full_path, 'r+') as f:
118            f.truncate(length)
119
120    def flush(self, path, fh):
121        return os.fsync(fh)
122
123    def release(self, path, fh):
124        return os.close(fh)
125
126    def fsync(self, path, fdatasync, fh):
127        return self.flush(path, fh)
128
129
130def main(mountpoint, root):
131    FUSE(Passthrough(root), mountpoint, nothreads=True, foreground=True)
132
133if __name__ == '__main__':
134    main(sys.argv[2], sys.argv[1])

Take a minute to analyze Stavros' code. It just implements a “passthrough filesystem”, that just mount a directory into a mount point. For each operation requested to the mount point, it returns the python implementation on the real file of the mounted directory.

So, to try this code just save this file as Passthrough.py and run

python Passthrough.py [directoryToBeMounted] [directoryToBeUsedAsMountpoint]

That’s it! Now, your bare new filesystem is mounted on what you specified in the*[directoryToBeUsedAsMountpoint]* parameter and all the operations you will do on this mount point will be silently passed to what you specified in the *[directoryToBeMounted]* parameter.

Really cool, even if a little bit useless so far… :)

So, how can we implement our filesystem as said before? Thanks to Stavros, our job is quite simple. We just need to create a class that inherits from Stavros' base class and overrides some methods.

The first method we have to override is the _full_path method. This method is used in the original code to take the mount point relative path and translate it to the real mounted path. In our filesystem, this will be the most difficult piece of code, because we will need to add some logic to define if the requested path belongs to the PSS or to the SSS. However, also this “most difficult piece of code” is quite trivial.

We just need to verify if the requested path exists at least in one storage system. If it does, we will return the real path, if not, we will assume that the path has been requested for a write operation on a file that does not exist yet. So we will try to look if the directory name of the path exists in one of the storage systems and we will return the correct path.

A look at the code will make things more clear:

 1    def _full_path(self, partial, useFallBack=False):
 2        if partial.startswith("/"):
 3            partial = partial[1:]
 4
 5        # Find out the real path. If has been requesetd for a fallback path,
 6        # use it
 7        path = primaryPath = os.path.join(
 8            self.fallbackPath if useFallBack else self.root, partial)
 9
10        # If the pah does not exists and we haven't been asked for the fallback path
11        # try to look on the fallback filessytem
12        if not os.path.exists(primaryPath) and not useFallBack:
13            path = fallbackPath = os.path.join(self.fallbackPath, partial)
14
15            # If the path does not exists neither in the fallback fielsysem
16            # it's likely to be a write operation, so use the primary
17            # filesystem... unless the path to get the file exists in the
18            # fallbackFS!
19            if not os.path.exists(fallbackPath):
20                # This is probabily a write operation, so prefer to use the
21                # primary path either if the directory of the path exists in the
22                # primary FS or not exists in the fallback FS
23
24                primaryDir = os.path.dirname(primaryPath)
25                fallbackDir = os.path.dirname(fallbackPath)
26
27                if os.path.exists(primaryDir) or not os.path.exists(fallbackDir):
28                    path = primaryPath
29
30        return path

Done this, we have almost finished. If we’re using a Linux system we have also to override the “*getattr” *function to return also the ‘st_blocks' attribute (it turned out that without this attribute the “du” bash command doesn’t work as expected).

So, we need just to override this method and return the extra attribute:

1    def getattr(self, path, fh=None):
2        full_path = self._full_path(path)
3        st = os.lstat(full_path)
4        return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime',
5                                                        'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid', 'st_blocks'))

And then we need to override the “readdir” function, that is the generator function that is called when someone does a “ls” in our mount point. In our case, the “ls” command has to list the content of both our primary storage system and our secondary storage system.

 1def readdir(self, path, fh):
 2        dirents = ['.', '..']
 3        full_path = self._full_path(path)
 4        # print("listing " + full_path)
 5        if os.path.isdir(full_path):
 6            dirents.extend(os.listdir(full_path))
 7        if self.fallbackPath not in full_path:
 8            full_path = self._full_path(path, useFallBack=True)
 9            # print("listing_ext " + full_path)
10            if os.path.isdir(full_path):
11                dirents.extend(os.listdir(full_path))
12        for r in list(set(dirents)):
13            yield r

We’ve almost finished, we just need to override the “main” method because we need an extra parameter (in the original code we had one directory to be mounted and one directory to be used as a mount point, in our filesystem we have to specify two directories to be mounted into the mount point).

So here there is the full code of our new file system “dfs” (the “Dave File System” :D )

 1#!/usr/bin/env python
 2
 3import os
 4import sys
 5import errno
 6
 7from fuse import FUSE, FuseOSError, Operations
 8from Passthrough import Passthrough
 9
10class dfs(Passthrough):
11    def __init__(self, root, fallbackPath):
12        self.root = root
13        self.fallbackPath = fallbackPath
14        
15    # Helpers
16    # =======
17    def _full_path(self, partial, useFallBack=False):
18        if partial.startswith("/"):
19            partial = partial[1:]
20        # Find out the real path. If has been requesetd for a fallback path,
21        # use it
22        path = primaryPath = os.path.join(
23            self.fallbackPath if useFallBack else self.root, partial)
24        # If the pah does not exists and we haven't been asked for the fallback path
25        # try to look on the fallback filessytem
26        if not os.path.exists(primaryPath) and not useFallBack:
27            path = fallbackPath = os.path.join(self.fallbackPath, partial)
28            # If the path does not exists neither in the fallback fielsysem
29            # it's likely to be a write operation, so use the primary
30            # filesystem... unless the path to get the file exists in the
31            # fallbackFS!
32            if not os.path.exists(fallbackPath):
33                # This is probabily a write operation, so prefer to use the
34                # primary path either if the directory of the path exists in the
35                # primary FS or not exists in the fallback FS
36                primaryDir = os.path.dirname(primaryPath)
37                fallbackDir = os.path.dirname(fallbackPath)
38                if os.path.exists(primaryDir) or not os.path.exists(fallbackDir):
39                    path = primaryPath
40        return path
41      
42    def getattr(self, path, fh=None):
43        full_path = self._full_path(path)
44        st = os.lstat(full_path)
45        return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime',
46                                                        'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid', 'st_blocks')) 
47
48    def readdir(self, path, fh):
49        dirents = ['.', '..']
50        full_path = self._full_path(path)
51        # print("listing " + full_path)
52        if os.path.isdir(full_path):
53            dirents.extend(os.listdir(full_path))
54        if self.fallbackPath not in full_path:
55            full_path = self._full_path(path, useFallBack=True)
56            # print("listing_ext " + full_path)
57            if os.path.isdir(full_path):
58                dirents.extend(os.listdir(full_path))
59        for r in list(set(dirents)):
60            yield r
61            
62def main(mountpoint, root, fallbackPath):
63    FUSE(dfs(root, fallbackPath), mountpoint, nothreads=True,
64         foreground=True, **{'allow_other': True})
65
66if __name__ == '__main__':
67    mountpoint = sys.argv[3]
68    root = sys.argv[1]
69    fallbackPath = sys.argv[2]
70    main(mountpoint, root, fallbackPath)

That’s it, now if we issue the command …

python dfs.py /home/dave/Desktop/PrimaryFS/ /home/dave/Desktop/FallbackFS/ /home/dave/Desktop/myMountpoint/

… we get a mount point (/home/dave/Desktop/myMountpoint/) that lists both the content of /home/dave/Desktop/PrimaryFS/ and /home/dave/Desktop/FallbackFS/ and that works as expected.

Yes, it was THAT easy!

A couple of notes

It worth to be noted that:

  • when we instantiate the FUSE object with **foreground=False **we can run the operation in the background.
  • The {‘allow_other': True} is really important if you need to share the mount point over the network with Samba (omitting this prevents you to share this directory).

That’s all folks, now stop reading and start to develop your first filesystem with Python! :)

D.


Did you find this article helpful?


Buy me a coffee! Buy me a coffee!