
Notes on lessfs setup
lessfs (lessfs.com) is a high performance inline data deduplicating filesystem for Linux. While setting up I collected a few notes that may be helpful for future setups. Mostly this is a way for me to reference this - it is in no way an authoritative guide, nor do I guarantee that any of this is correct - in fact if it isn't and you know so, let me know and I'll correct it. Before setting up the actual file system we need to make sure all dependencies are met. Via your package manager (i.e. apt-get) install the following: further, I was missing the following two libs which are required for tokyocabinet (also available through package manager): Additionally for 1.1.0 beta 6 and upwards I needed: - libmhash2 Now setup the most recent version of fuse (sourceforge).Untar, run configure, make, and checkinstall (or make install) as root. For lessfs 1.1.0-beta6 and upwards I also needed mhash (sourceforge). Untar, run configure, make and checkinstall (or make install) as root. Set up the most recent version of tokyocabinet (1978th.net). Untar, run configure (read the notes first!), make and checkinstall (or make install) as root. If all went well, set up lessfs (sourceforge). Untar, run configure, make, and checkinstall (or make install) as root. For instance, a raid should be formatted with the correct settings for: Next, we actually set up the lessfs filesystem. Most settings are set in the configuration file - there should be one in the package downloaded (folder etc). alternatively, jkiel posted a very cool shell script to calculate these values and create a config file for lessfs - here
Be sure to consider if you want your blocksize to be 128k or something other you will need to adjust this. In the config file set the correct mount points for the database files. We will need one point where the actual data gets stored (the hard disk) and a mount point where we mount the filesystem and where we can access and store things into the database. In the config file you exclusively define the mount points for the actual data. For instance if you mount the harddrive to /media/hdd1 define your points in the lessfs config file as /media/hdd1/data/dta/blockdata.tch . Now it's time to actually create the files, issue the following command (be sure you have mounted the hdd and have write access to it): Look into the mount point, we should have some files created for us. This should give you an output similar to this:
Be sure your bucket number is the same as what we calculated before. Now mount your lessfs filesystem to a mount point - the following is a single command: /media/path/to/your/mountpoint should be an empty directory where the lessfs filesystem will be mounted and which gives you access to the files on the filesystem. If all went well you can now simply copy files into this directory and lessfs deduplicates and compresses them on the fly. To tune this - a larger blocksize such as 128k(131072) will give faster performance, at the cost of less (actually negligable) deduplication.
When the underlying hardware crashes, it is possible that the databases become corrupted beyond repair. Tokyocabinet does have transaction support, but not grouped over a set of databases. So in rare cases one database can have it's transactions committed before a crash while another one was still in the process of doing so. Only fsck lets lessfs recover from such an event. The blockusage database is important for performance reasons. But this is the only database that can be reconstructed from scratch. It contains references on how much a particular block (hash) is used. We need this reference to be able to know if it can be deleted. Loosing the fileblock database would be a disaster that would not be recoverable. This database contains a list of inode-blocknr : hash So this tells us how an inode is build. Since both databases (btrees) are updated and accessed upon every write, it makes sense to put them on fast (IOPS) storage. Some sources where I collected information: |