Interactive-diagrams GSoC progress report

Intro

As some of you may already know, I’ve published the first demo version of the interactive-diagrams online, it can be found at http://paste.hskll.org (thanks to my mentor Luite Stegeman for hosting). It’s not very interactive yet, but it’s a good start. At the same time it took me a while to get everything up and running so in this blog post I would like to describe and discuss the overall structure and design of the project along with some details about the vast number of security restrictions that are being used.

Please note that http://paste.hskll.org is just a demo version and I can guarantee neither the safety of your pastes nor the uptime of the app. The ‘release notes’ can be found here.

If you have any suggestions or bug reports don’t hesitate to mail me (difrumin аt gmail dоt com) or use the bugtracker.

System requirements

GNU/Linux operating system, GHC 7.7 (I think it’s possible to make the whole thing work with GHC 7.6 but I don’t have time to support it and test everything), lots of RAM. In order to use some security restrictions you will also need SELinux and cgroup.

High-level structure

The whole program consists of three main components (it would be better to say three main types of components since there are usually multiple workers in the system):

  • The web app (sources can be found in scotty-pastebin), powered by WAI, Scotty and Blaze;
  • The service app (eval-api/src-service);
  • Workers (eval-api/src).

The web server handles user requests, database logic, and renders the results. Workers are the processes that perform the actual evaluation. The service component is the one that handles the pool of workers, keeps track of how many workers are available and forks new workers if necessary. The web app does not communicate with workers without the permission of the service.

All the communication between the components is performed with the help of UNIX sockets.

Request example

Here’s an example workflow in the system:

  1. User connects to the web server, sends the request to evaluate a certain bit of code.
  2. Web server talks to the service, requesting a worker.
  3. Server reuses an existing worker if an idle one exists. Otherwise it forks a new one or blocks if the limit is reached.
  4. Worker, upon starting, loads the necessary libraries and applies security restrictions.
  5. The web server receives a worker and sends it a request for evaluation.
  6. The server waits, if there is no reply from the worker after a certain amount of time, it sends a message to the service saying that the worker timed out. If the web service receives the reply, it stores the result in the database and continues with the user request.
  7. When the service receives message about one if its workers it decides whether to kill/restart it or not. If the worker’s process has timed out or results in an error (eg: out of memory exception) then the service restarts it.

Component permissions

Setting up the right permissions for the components is a crucial part in creating a secure environment. Depending on what security restrictions you have enabled you might want to choose different permissions for the processes. On http://paste.hskll.org we use the full set of security restrictions and limits (see the next section) and it requires us to give the components certain permissions.

  • scotty-pastebin is run as a user in a multithreaded runtime;
  • eval-service is run as a superuser (required for setting up chrooted jails) in a single-threaded environment (required due to forking/SELinux restrictions, see the SELinux section for details), listens on the ‘control’ socket;
  • workers are forked from eval-service as root, but they change their processes’ uid as soon as possible, listens on the ‘workerN’ socket (opened prior to chroot’ing);

Additionally the whole thing runs in a VM.

See also this wiki page written by luite

Security limitations and restrictions

Interactive-digrams applies a whole lot of limitations to the worker processes, which can be configured using the following datatype:

data LimitSettings = LimitSettings
    { -- | Maximum time for which the code is allowed to run
      -- (in seconds)
      timeout     :: Int
      -- | Process priority for the 'nice' syscall.
      -- -20 is the highest, 20 is the lowest
    , niceness    :: Int
      -- | Resource limits for the 'setrlimit' syscall
    , rlimits     :: Maybe RLimits
      -- | The directory that the evaluator process will be 'chroot'ed
      -- into. Please note that if chroot is applied, all the pathes
      -- in 'EvalSettings' will be calculated relatively to this
      -- value.
    , chrootPath  :: Maybe FilePath
      -- | The UID that will be set after the call to chroot.
    , processUid  :: Maybe UserID
      -- | SELinux security context under which the worker 
      -- process will be running.
    , secontext   :: Maybe SecurityContext
      -- | A filepath to the 'tasks' file for the desired cgroup.
      -- 
      -- For example, if I have mounted the @cpu@ controller at
      -- @/cgroups/cpu/@ and I want the evaluator to be running in the
      -- cgroup 'idiaworkers' then the 'cgroupPath' would be
      -- @/cgroups/cpu/idiaworkers@
    , cgroupPath  :: Maybe FilePath
    } deriving (Eq, Show, Generic)

There is also a Default instance for LimitSettings and RLimits with most of the restrictions turned off:

defaultLimits :: LimitSettings
defaultLimits = LimitSettings
    { timeout    = 3
    , niceness   = 10
    , rlimits    = Nothing
    , chrootPath = Nothing
    , processUid = Nothing
    , secontext  = Nothing 
    , cgroupPath = Nothing
    }

Below I’ll briefly describe each limitation/restriction with some details.

Timeout & niceness

The timeout field specifies how much time (in seconds) the server waits for the worker. (Note: this is the only limitation that is controlled on the side of the web server. The corresponding procedure is processTimeout. We really want this to be run in the multithreaded environment)

Niceness is merely the value passed to the nice() syscall, nothing special.

rlimits

The resource limits are controlled by syscalls to setrlimit. The limits itself are defined in the RLimits datatype:

data RLimits = RLimits
    { coreFileSizeLimit :: ResourceLimits
    , cpuTimeLimit      :: ResourceLimits
    , dataSizeLimit     :: ResourceLimits
    , fileSizeLimit     :: ResourceLimits
    , openFilesLimit    :: ResourceLimits
    , stackSizeLimit    :: ResourceLimits
    , totalMemoryLimit  :: ResourceLimits
    } deriving (Eq, Show, Generic)

ResourceLimits itself is defined in System.Posix.Resource. For more information on resource limits see setrlimit(3).

Chrooted jail

In order to restrict the worker process we run it inside the chroot jail. The easiest way to create a fully working jail is to use debootstrap. It’s also necessary to install gcc and GHC libraries inside the jail.

mkdir
sudo debootstrap wheezy /idia/run/workers/worker1
sudo chmod  /idia/run/workers/worker1
cd /idia/run/workers/worker1
sudo mkdir -p ./home/
sudo chown  ./home/
cd ./home/
mkdir .ghc && sudo mount --bind ~/.ghc .ghc
mkdir .cabal && sudo mount --bind ~/.cabal .cabal
mkdir ghc && sudo mount --bind ~/ghc ghc # ghc libs
cd ../..
cp ~/interactive-diagrams/common/Helper.hs .
sudo chroot .
apt-get install gcc # inside the chroot

I tried installing Emdebian using multistrap to reduce the size of the jail, but GHC won’t run properly in that environment, complaining about librt.so (which was present in the system), so I decided to stick with debootstrap. If anyone knows how to avoid this problem with multistrap please mail me or leave a comment.

Process uid

This is the uid the worker process will run under. The socket file will also be created by the user with this uid.

SELinux

SELinux (Security-enhanced Linux) is a Linux kernel module providing mechanisms for enforcing fine-grained mandatory access control, brought to you by the creators of infamous PRISM!

SELinux allows the system administrator to control the security of the system by specifying in (modular) policy files the AVC (access vector cache) rules. The SELinux kernel module sits there and monitors all the syscalls and, if it finds something that is not explicitly allowed in the policy, it blocks it. (Well, actually something a little bit different is going on, but for the sake of simplicity I am leaving it out)

Everything on your system – files, network sockets, file handles, processes, directories – is labelled with a SELinux security context, which consists of a role, a user name (not related to the regular system user name) and a domain (also called type in some literature). In the policy file you specific which domains are allowed to perform various actions on other domains. A typical piece of the policy file will look like this:

allow myprocess_t self:udp_socket { create connect };
allow myprocess_t bin_t:file { execute };

The first line states that the process from the domain myprocess_t is allowed to create and connect to the UDP sockets of the same domain. The second line allows a process in that domain to execute files of type bin_t (usually files in /bin/ and /usr/bin).

Note: the secontext field actually contains only the
security domain. When the worker process is changing the security
context it uses the same user/resource as it originally had.

In our SELinux policy we have several domains:

  • idia_web_t – the domain under which the scotty-pastebin run
  • idia_web_exec_t – the domain of the scotty-pastebin executable and other files associated with that binary
  • idia_service_t – the domain under which the eval-service run
  • idia_service_exec_t – the domain of the eval-service executable and other files associated with that binary
  • idia_service_sock_t – UNIX socket files used for communication
  • idia_db_t, idia_pkg_t, idia_web_common_t – database files, packages, html files, templates and other stuff
  • idia_worker_env_t – chroot’ed environment in which the worker operates
  • idia_restricted_t – the most restricted domain in which the workers run and evaluate code

The reason we made the service program run in a single-threaded environment is the following: if we run it in the multi-threaded environment (like we wanted) the worker processes want to have access to file descriptors, inherited from the idia_service_t, which, of course, is dangerous and should not be allowed.

I personally don’t enjoy using SELinux very much. It’s very hard to configure it and among its shortcomings I can list the fact that there is no distinction between file types and process types; there is no proper separation, even when using the modular policy, as the duplicated types are checked when you load the module and there is no way (that I know of) to easily introduce a fresh unused type. And there is this thing that puzzled me for quite a while: home directories are treated specially. Even if you configure a subdirectory in your home dir to have a specific security context, restorecon won’t correctly install the context specified in the policy. You actually have to set he context yourself, using chcon.

Cgroups

Cgroups is the system that can be used to aid the way Linux schedules CPU time/shares, distributes memory to the processes. It does so by organizing processes into hierarchical groups with configured behaviour.

Installing cgroups on debian is somewhat tricky, because the package is a little bit weird.

sudo apt-get install cgroup-bin libcgroup1
sudo cgconfigparser -l ~/interactive-diagrams/cgconfig.conf

For our purposes we have a cgroup called idiaworkers. We also mount the cpu controller on /cgroups/cpu:

$> ls -l /cgroups/cpu/
total 0
-rw-r--r--. 1 root root 0 Jul 12 16:22 cgroup.clone_children
--w--w--w-. 1 root root 0 Jul 12 16:22 cgroup.event_control
-rw-r--r--. 1 root root 0 Jul 12 16:22 cgroup.procs
-rw-r--r--. 1 root root 0 Jul 12 16:22 cpu.shares
drwxr-xr-x. 2 root root 0 Jul 12 16:22 idiaworkers
-rw-r--r--. 1 root root 0 Jul 12 16:22 notify_on_release
-rw-r--r--. 1 root root 0 Jul 12 16:22 release_agent
-rw-r--r--. 1 root root 0 Jul 12 16:22 tasks
$> ls -l /cgroups/cpu/idiaworkers
total 0
-rw-r--r--. 1 root    root 0 Jul 12 16:22 cgroup.clone_children
--w--w--w-. 1 root    root 0 Jul 12 16:22 cgroup.event_control
-rw-r--r--. 1 root    root 0 Jul 12 16:22 cgroup.procs
-rw-r--r--. 1 root    root 0 Jul 12 16:22 cpu.shares
-rw-r--r--. 1 root    root 0 Jul 12 16:22 notify_on_release
-rw-r--r--. 1 vagrant root 0 Jul 14 06:21 tasks

In order to modify how much CPU time our group gets, we write to the cpu.shares file: sudo echo 100 > /cgroups/cpu/idiaworkers/cpu.shares. If we want to add the task/process to the group we simply append the tasks file: echo $PID >> /cgroups/cpu/idiaworkers/tasks. The workers append themselves to the task file automatically (if the cgroup restrictions are enabled in the LimitSettings).

Open problems/requests

I am still not sure how do I write tests for this project. Do I write tests for my GHC API wrappers? Do I write tests for my workers pool? I probably should take a look how similar projects handles those.

Outro

So, as you can see, we have something working here and now that we manage to take the initial steps it will be much easier for us to push changes and make them available for public to use and comment on. There is still a long way to come. The code needs some serious cleanup (we’ve switched the design model a couple of weeks ago, which affected the internal structure seriously), the documentation needs to be written. And of course new features are waiting to be implemented :) We will be supporting multiple UIDs for workers and looking into using LXC for simplifying the setup process too.

I would like to thank augur and luite for their editorial feedback.

Stay tuned for the next posts about configuring the program for evaluation settings and reusing the components from the library.

About these ads

4 thoughts on “Interactive-diagrams GSoC progress report

  1. Pingback: ANN: restricted-workers-0.1.0 | (parentheses)

  2. Pingback: GSoC 2013, an afterword | (parentheses)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s