As a enterprise architect/infrastructure developer I'm faced with allot of problems that seem trivial on small scale but when you need to manage thousands of workstations or servers become critical problems. One of them is hard drive partitioning.
For security and stability reasons it is wise to split your root file system to several peaces - my default is:
/boot
/
/tmp
/var
/var/log
/home
swap
This may seem like crazy layout, but when you think of it, most problems start with servers getting root kited by some trivial exploit that will be used to trigger a buffer overflow and there for get a shell. If the tmp, var, var/log and home are mounted as noexec even when the exploit manages to download the overflow script - it wont be allowed to execute. This means its 1 step closer to avoid a security breach.
As I said before - it looks like a trivial task - install the server properly and thats it. Well this advice is good for up to 25..50 machines per admin, but it will fail miserably when you are running hundreds or thousands of machines. You may succeed to install them but soon after you are faced with a problem that some partitions need to be resized because one is almost empty and the other is constantly filling up.
If its just a pc you can take it offline and reinstall/boot to rescue and resize them. But when you are running mission critical servers its not an option. The same gos when you are running systems with different hardware configurations and there fore with different disk sizes. Here comes in the Logical Volume Manager (LVM). It creates another layer of abstraction on top of the physical layer therefore giving you another layer of dynamic configuration.
The greatest advantage of using LVM is that it you can create/resize/delete volumes on the fly without needing to reboot the server or boot to rescue to manage it. Sure it is not as simple as it sounds but basically it works wonders. It also gives the possibility to build the LVM volume on top of the hard drive or a raid device on full extent leaving you (in ideal case) with only 1 partition on the device (I use 2 - I like to keep /boot separately).
So in my case it is not wise make "full" partitioning of the drive, but to keep the partition sizes to minimum (that means with 10% overhead) leaving the rest of the drive free. This will allow me the simply "add" space to the volume that needs space without needing to shrink another volume (this volume cant be mounted while shrinking so this will need rescue mode!!!). Also keeping operating system on one volume group and data on another can save allot of work when restoring a system. For this I use a software designed by me called FabricManager - looks like I have to blog about in the future...
This works well until you run into a problem - You have 1000 workstations installed with this configuration and you want to do a dist upgrade. All is goo until you run out of space in the root volume to do it. 80% of the disk is filled with downloaded rpm's and there is no space left to install them. With LVM it is no problem, You just add some space to the root volume and all works out well, but here comes the catch - how long will it take to manually resize the root volume in 1000 workstations? I'm guessing around a month or two. And this is the best case scenario. Assuming that a good security policy is in place and none of the workstations have the same root password the time could be extended to few years which defies the whole point of upgrading. Sure if all the workstations have been installed with the same root password you can use an script to run the command automatically, but then You will run into about 10% of computers that wont accept the password and the 10..20% of PCs that are turned off at the time you run it. What about strict security policy that wont allow root to even log in remotely? What if the workstations are in different locations or countries or even continents? You can't just run the script few times and hope for the best as some workstations get several times the space needed and some get none.
And so a trivial problem grew out of proportions in seconds and one small design flaw can cost ALLOT of time trying to fix all the problems separately. this led me to write a small shell script that would be able to create/extend logical volumes as needed and it can be found here. It is not 100% what I need but the core functionality is there and it works.
This also solves problems with few "special" applications that need their own partition to run and as I try towards simplicity I can now package the script into the autoupdate package (right... haven't mentioned it before also... need to fix it in near future...) and set a dependence for it. So to automate the install of the application I just add one row in the %pre section of the package and woala - new volume created in the install time :D
It also helps to resize partitions by simply updating the base package of my configuration that contains the sizes of the required volumes. this gives me the ability to reconfigure workstations in few days to new configuration and life can go on...
For conclusion - security is important, so no cutbacks should be made there, noexec and nodev are your friends - use them wisely, LVM is a great tool but it is not a magic bullet - adding an extra abstraction layer multiples the complexity of the system so use it carefully and when developing large scale infrastructure multiply the the count of computers with 1000 and think if the solution is easily managed then.
This script was built for CentOS 4.4 and tested on it - it will work on other distros but may need some modifications.
technorati tags:CentOS, LVM, security, Infrastructure, bash, script, rpm, management, partitioning
Blogged with Flock