BACK

pixar-cars

Notes on lessfs setup

lessfs (lessfs.com) is a high performance inline data deduplicating filesystem for Linux. While setting up I collected a few notes that may be helpful for future setups. Mostly this is a way for me to reference this - it is in no way an authoritative guide, nor do I guarantee that any of this is correct - in fact if it isn't and you know so, let me know and I'll correct it.

Before setting up the actual file system we need to make sure all dependencies are met. Via your package manager (i.e. apt-get) install the following:
- build-essential
- libselinux1-dev
- libsepol1-dev
- pkg-config
- checkinstall (if you prefer to install your packages with 'make install' you won't need this)

further, I was missing the following two libs which are required for tokyocabinet (also available through package manager):
- zlib1g-dev
- libbz2-dev

Additionally for 1.1.0 beta 6 and upwards I needed: - libmhash2

Now setup the most recent version of fuse (sourceforge).Untar, run configure, make, and checkinstall (or make install) as root.
Some notes on this: I had to manually link between /lib/libfuse.so.2 and /usr/local/lib/libfuse.so.2.8.3 due to my distro.

For lessfs 1.1.0-beta6 and upwards I also needed mhash (sourceforge). Untar, run configure, make and checkinstall (or make install) as root.

Set up the most recent version of tokyocabinet (1978th.net). Untar, run configure (read the notes first!), make and checkinstall (or make install) as root.
Some notes on this: If you are on a 32bit system run configure --enable-off64.

If all went well, set up lessfs (sourceforge). Untar, run configure, make, and checkinstall (or make install) as root.

Now, comes the actual setup of a lessfs filesystem.
I chose an ext4 filesystem but most others should do. Obviously set up the underlying filesystem correctly.

For instance, a raid should be formatted with the correct settings for:
stride size = chunk/block
stripe width = stride * (N-1)
check the following if you don't like calculating manually: raid calculator

Next, we actually set up the lessfs filesystem. Most settings are set in the configuration file - there should be one in the package downloaded (folder etc).
Copy this into /etc/ .

alternatively, jkiel posted a very cool shell script to calculate these values and create a config file for lessfs - here


We need to adjust the number of buckets in the tokyocabinet DB (BLOCKDATA_BS, BLOCKUSAGE_BS, ...)
We can calculate this if we know a ballpark figure of how much data the database is to store.
The number of buckets in the database is the following:
# of buckets = 4 * (bytes to be stored / blocksize)
here's an example:
bytes to be stored = 42GB = 42 * (1024*1024*1024) = 45097156608
blocksize = 64k = 64*1024 = 65535
calculate it and we get 2752552
The only constant in the equation, the 4, can be any number between 0.5 and 4, as described in the tokyocabinet docs:
“Suggested size of the bucket array is about from 0.5 to 4 times of the number of all records to be stored”

Be sure to consider if you want your blocksize to be 128k or something other you will need to adjust this.

In the config file set the correct mount points for the database files. We will need one point where the actual data gets stored (the hard disk) and a mount point where we mount the filesystem and where we can access and store things into the database. In the config file you exclusively define the mount points for the actual data. For instance if you mount the harddrive to /media/hdd1 define your points in the lessfs config file as /media/hdd1/data/dta/blockdata.tch .

Now it's time to actually create the files, issue the following command (be sure you have mounted the hdd and have write access to it):
mklessfs -fc /etc/pathtoconfig.cfg

Look into the mount point, we should have some files created for us.
You can run the following on your blockdata.tch:
tchmgr inform /media/path/to/your/blockdata.tch

This should give you an output similar to this:
path: /media/path/to/your/blockdata.tch
database type: hash
additional flags: open
bucket number: 2752552
alignment: 1
free block pool: 1
inode number: 25
modified time: 2010-04-01T13:55:15+01:00
options: large
record number: 0
file size: 192696458

Be sure your bucket number is the same as what we calculated before.

Now mount your lessfs filesystem to a mount point - the following is a single command:
lessfs /etc/pathtoyourconfig.cfg /media/path/to/your/mountpoint

/media/path/to/your/mountpoint should be an empty directory where the lessfs filesystem will be mounted and which gives you access to the files on the filesystem.

If all went well you can now simply copy files into this directory and lessfs deduplicates and compresses them on the fly.

To tune this - a larger blocksize such as 128k(131072) will give faster performance, at the cost of less (actually negligable) deduplication.

 

When the underlying hardware crashes, it is possible that the databases become corrupted beyond repair. Tokyocabinet does have transaction support, but not grouped over a set of databases. So in rare cases one database can have it's transactions committed before a crash while another one was still in the process of doing so. Only fsck lets lessfs recover from such an event.

The blockusage database is important for performance reasons. But this is the only database that can be reconstructed from scratch. It contains references on how much a particular block (hash) is used. We need this reference to be able to know if it can be deleted.

Loosing the fileblock database would be a disaster that would not be recoverable. This database contains a list of inode-blocknr : hash So this tells us how an inode is build.

Since both databases (btrees) are updated and accessed upon every write, it makes sense to put them on fast (IOPS) storage.
Put all the metadata on one Raid1 SSD set, or split them over multiple Raid1 SAS sets.
The bottom line here is that more IOPS for metadata equals more sequential throughput for lessfs.

Some sources where I collected information:
[1] HOWTO: Install LessFS (Deduplication FS) on 9.10
[2] Lessfs feedback comments
[3] Lessfs config file shell script