Friday, May 30, 2014

Cross platform python method for calculating disk free space

I wanted a cross platform (Mac/Win/Linux) method for calculating disk free space. For my application ideally I'd like to: read it out of a file (or registry key) somewhere, failing that have a python library do it for me, and failing that shell out to something and parse the output. I couldn't find the free space in any files (including /proc), registry keys, or anywhere else file-like. So like this stackoverflow thread, I came to the conclusion that WMI for windows and os.statvfs for Mac+Linux were the best options.

First Windows, it's fairly straightforward. The MS doco for this WMI call is here, and it also explains the DriveType codes.
PS > $query = "select * from win32_logicaldisk"

PS > Get-WmiObject -Query $query


DeviceID     : C:
DriveType    : 3
ProviderName : 
FreeSpace    : 190249115648
Size         : 249690058752
VolumeName   : 

DeviceID     : Z:
DriveType    : 4
ProviderName : \\share\homedir\username
FreeSpace    : 15784280064
Size         : 26843545600
VolumeName   : nethomes$
Those numbers are bytes, so windows is pretty easy. On to the mess that is statvfs. First, the statfs man page gives a little history:
The original Linux statfs() and fstatfs() system calls were not designed with extremely large file sizes in mind. Subsequently, Linux 2.6 added new statfs64() and fstatfs64() system calls that employ a new structure, statfs64. The new structure contains the same fields as the original statfs structure, but the sizes of various fields are increased, to accommodate large file sizes. The glibc statfs() and fstatfs() wrapper functions transparently deal with the kernel differences. Some systems only have , other systems also have , where the former includes the latter. So it seems including the former is the best choice. LSB has deprecated the library calls statfs() and fstatfs() and tells us to use statvfs(2) and fstatvfs(2) instead.
Sounds like we should use statvfs and python has a os.statvfs so we should be good. Don't get fooled by this nasty deprecation notice, it's referring the the statvfs module which just defined a few constants. That's deprecated, but the os.statvfs function is alive and well in recent Python versions.

But wait, there's chatter about statvfs being dangerous on glibc systems and the df code said not to use it at some stage. Basically if you have a network filesystem listed in /proc/mounts and it is unreachable (e.g. because there is no network), statvfs will hang on stat'ing the network directory, even if you called statvfs on a completely different directory. df works around this by continuing to use statfs on glibc systems. I tested this with strace and it's true on my Ubuntu linux machine:
$ strace df
[snip]
statfs("/usr/local/home/user", {f_type=0x65735546, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=1024, f_frsize=4096}) = 0
statfs("/nethome/user", {f_type="NFS_SUPER_MAGIC", f_bsize=8192, f_blocks=367001600, f_bfree=159547821, f_bavail=159547821, f_files=31876689, f_ffree=12707362, f_fsid={0, 0}, f_namelen=255, f_frsize=8192}) = 0
[snip]
We can see that python os.statvfs is doing the same (and so is "stat -f"). So we should be safe using python's os.statvfs.
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0

# No statvfs calls
$ strace python -c "import os;os.statvfs('/')" 2>&1 | grep statvfs
execve("/usr/bin/python", ["python", "-c", "import os;os.statvfs('/')"], [/* 56 vars */]) = 0

# stat -f does the same
$ strace stat -f / 2>&1 | grep statfs
statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=9743394, f_bfree=5606442, f_bavail=5118199, f_files=2441216, f_ffree=2066663, f_fsid={1820746783, 1207614867}, f_namelen=255, f_frsize=4096}) = 0
The next question is, how do you actually calculate the free space in bytes? Starting with: what is the block size? f_bsize is the "Preferred file system block size" and f_frsize is the "Fundamental file system block size" according to the python doco, and if you read the statfs man page it says "optimal transfer block size" and "fragment size (since Linux 2.6)" respectively. Confusing much?

On my linux machine they are the same:
In [6]: import os

In [7]: st = os.statvfs("/")

In [8]: st.f_bsize
Out[8]: 4096

In [9]: st.f_frsize
Out[9]: 4096

In [10]: !stat -f -c "Block size (for faster transfers): %s, Fundamental block size (for block counts): %S" /
Block size (for faster transfers): 4096, Fundamental block size (for block counts): 4096
On OS X they are not:
In [1]: import os

In [2]: st = os.statvfs("/")

In [3]: st.f_bsize
Out[3]: 1048576

In [4]: st.f_frsize
Out[4]: 4096
So on OS X f_bsize is 1MB, but that isn't actually the block size used by the filesystem, so using f_frsize looks like the best option for both platforms. The remaining sticking point is that pre-2.6-kernel linux machines don't have f_frsize, so we should check if it is zero and use f_bsize instead in that case.

OK so we have a blocksize, but what free size should we use? f_bfree is "free blocks in fs" and f_bavail is "free blocks available to unprivileged user". These can actually be quite different, e.g. mkfs.ext3 reserves 5% of the filesystem blocks for the super-user by default. Which one you care about probably depends on why you are measuring free disk space. In my case I chose f_bavail, (which is also what df reports).

The final product:
In [16]: def PrintFree(path):
   ....:     st = os.statvfs(path)
   ....:     if st.f_frsize:
   ....:         print "Free bytes: %s" % (st.f_frsize * st.f_bavail) 
   ....:     else:
   ....:         print "Free bytes: %s" % (st.f_bsize * st.f_bavail)
   ....:         

In [17]: PrintFree("/")
Free bytes: 127470809088

In [18]: !df -B 1
Filesystem                1B-blocks        Used    Available Use% Mounted on
/dev/sda1              153117560832 17845137408 127470809088  13% /

No comments: