GHC API: Interpreted, compiled and package modules

The third post in the series.

Intro

It’s hard to get into writing code that uses GHC API. The API itself is and the number of various functions and options significantly outnumber the amount of tutorials around.

In this series of blog posts I’ll elaborate on some of the peculiar, interesting problems I’ve encountered during my experience writing code that uses GHC API and also provide various tips I find useful.

I have built for myself a small layer of helper functions that helped me with using GHC API for the interactive-diagrams project. The source can be found on GitHub and I plan on refactoring the code and releasing it separately.

Today I would like to talk about a different ways of bringing contents of Haskell modules into scope, a process that is necessary for evaluating/interpreting bits of code on-the-fly.

Many of the points I make in this post are actually trivial, but nevertheless I made all of the mistakes I mentioned in this, perhaps post due to my naive approach of quickly diving in and experimenting, instead of reading into the documentation and source code. Now I actually realize that this post should been the first in the series, since it probably deals with more basic (and fundamental) stuff than the previous two posts. But anyway, here it is.

Interpreted modules

Imagine the following situation: we have a Haskell source file with code we want to load dynamically and evaluate. That is a basic task in the GHC API terms but nevertheless there are some caveats. We start with the most basics.

Let us have a file ‘test.hs’ containing the code we want to access:

module Test (test) where
test :: Int
test = 123

The basic way to get the ‘test’ data would be to load ‘Test’ as an interpreted module:

import Control.Applicative
import DynFlags
import GHC
import GHC.Paths
import GhcMonad            (liftIO) -- from ghc7.7 and up you can use the usual
    -- liftIO from Control.Monad.IO.Class
import Unsafe.Coerce

main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
    runGhc (Just libdir) $ do
        -- we have to call 'setSessionDynFlags' before doing
        -- everything else
        dflags <- getSessionDynFlags
        -- If we want to make GHC interpret our code on the fly, we
                  -- ought to set those two flags, otherwise we
                  -- wouldn't be able to use 'setContext' below
        setSessionDynFlags $ dflags { hscTarget = HscInterpreted
                                    , ghcLink   = LinkInMemory
                                    }
        setTargets =<< sequence [guessTarget "test.hs" Nothing]
        load LoadAllTargets
        -- Bringing the module into the context
        setContext [IIModule $ mkModuleName "Test"]

        -- evaluating and running an action
        act <- unsafeCoerce <$> compileExpr "print test"           
        liftIO act

The reason that we have to use HscInterpreted and LinkInMemory is that otherwise it would compile test.hs in the current directory and leave test.hi and test.o files, which we would not be able to load in the interpreted mode. setContext, however will try to bring the code in those files first, when looking for the module ‘Test’

dan@aquabox
[0] % ghc --make target.hs -package ghc
[1 of 1] Compiling Main             ( target.hs, target.o )
Linking target ...

dan@aquabox
[0] % ./target
123

Let’s try something fancier like printing a list of integers, one-by-one.

main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
    runGhc (Just libdir) $ do
        dflags <- getSessionDynFlags
        setSessionDynFlags $ dflags { hscTarget = HscInterpreted
                                    , ghcLink   = LinkInMemory
                                    }
        setTargets =<< sequence [guessTarget "test.hs" Nothing]
        load LoadAllTargets
        -- Bringing the module into the context
        setContext [IIModule $ mkModuleName "Test"]

        -- evaluating and running an action
        act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
        liftIO act

But when we try to run it..

dan@aquabox
[0] % ./target
target: panic! (the 'impossible' happened)
  (GHC version 7.6.3 for x86_64-apple-darwin):
        Not in scope: `forM_'

Please report this as a GHC bug:

http://www.haskell.org/ghc/reportabug

Hm, it looks like we need to bring Control.Monad into the scope.

This brings us to the next section.

Package modules

Naively, we might want to load Control.Monad in a similar fashion as we did with loading test.hs

main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
    runGhc (Just libdir) $ do
        dflags <- getSessionDynFlags
        setSessionDynFlags $ dflags { hscTarget = HscInterpreted
                                    , ghcLink   = LinkInMemory
                                    }
        setTargets =<< sequence [ guessTarget "test.hs" Nothing
                                , guessTarget "Control.Monad" Nothing]
        load LoadAllTargets
        -- Bringing the module into the context
        setContext [IIModule $ mkModuleName "Test"]

        -- evaluating and running an action
        act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
        liftIO act

Our attempt fails:

dan@aquabox
[0] % ./target
target: panic! (the 'impossible' happened)
  (GHC version 7.6.3 for x86_64-apple-darwin):
        module `Control.Monad' is a package module

Please report this as a GHC bug:

http://www.haskell.org/ghc/reportabug

Huh, what? I thought guessTarget works on all kinds of modules.

Well, it does. But it doesn’t “load the module”, it merely sets it as the target for compilation, basically it (together with load LoadAllTargets) does what ghc --make does. And surely it doesn’t make much sense to ghc --make Control.Monad when Control.Monad is a module from the base package. What we need to do instead is to bring the compiled Control.Monad module into scope. Luckily it’s not very hard to do with the help of the simpleImportDecl :: ModuleName -> ImportDecl name:

main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
    runGhc (Just libdir) $ do
        dflags <- getSessionDynFlags
        setSessionDynFlags $ dflags { hscTarget = HscInterpreted
                                    , ghcLink   = LinkInMemory
                                    }
        setTargets =<< sequence [ guessTarget "test.hs" Nothing ]
        load LoadAllTargets
        -- Bringing the module into the context
        setContext [ IIModule . mkModuleName $ "Test"
                   , IIDecl
                     . simpleImportDecl
                     . mkModuleName $ "Control.Monad" ]

        -- evaluating and running an action
        act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
        liftIO act

And we can run it

dan@aquabox
[0] % ./target
1
2
123

Compiled modules

What we have implemented so far corresponds to the :load* command in GHCi, which gives us the full access to the source code of the program. To illustrate this let’s modify our test file:

module Test (test) where

test :: Int
test = 123

test2 :: String
test2 = "Hi"

Now, if we try to load that file as an interpreted module and evaluate test2 nothing will stop us from doing so.

dan@aquabo
[0] % ./target-interp
(123,"Hi")

To use the compiled module we have to bring Test into the context the same way we dealt with Control.Monad

main = defaultErrorHandler defaultFatalMessager defaultFlushOut $ do
runGhc (Just libdir) $ do
    dflags <- getSessionDynFlags
    setSessionDynFlags $ dflags { hscTarget = HscInterpreted
                                , ghcLink   = LinkInMemory
                                }
    setTargets =<< sequence [ guessTarget "Test" Nothing ]
    load LoadAllTargets
    -- Bringing the module into the context
    setContext [ IIDecl $ simpleImportDecl (mkModuleName "Test")
               , IIDecl $ simpleImportDecl (mkModuleName "Prelude")
               ]
    printExpr "test"
    printExpr "test2"

printExpr :: String -> Ghc ()
printExpr expr = do
    liftIO $ putStrLn ("-- Going to print " ++ expr)
    act <- unsafeCoerce <$> compileExpr ("print (" ++ expr ++ ")")
    liftIO act

Output:

dan@aquabox : ~/snippets/ghcapi
[0] % ./target
-- Going to print test
123
-- Going to print test2
target: panic! (the 'impossible' happened)
  (GHC version 7.6.3 for x86_64-apple-darwin):
	Not in scope: `test2'
Perhaps you meant `test' (imported from Test)

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

Note: I had to bring the Prelude into context this time, like a regular module. I tried setting the ideclImplicit option in ImportDecl, but it didn’t work for some reason. Maybe it actually supposed to do not what I think it supposed to do, but something else.

Outro

So, that is it, we have managed to dynamically load Haskell source code and evaluate it. I can only refer you to the GHC haddocs for specific functions that we used in this post, most of them contain way more options that we used and they might prove to be useful to you.

Adding a package database to the GHC API session

The second post in the series.

Intro

It’s hard to get into writing code that uses GHC API. It’s huge there are so many options around and not a lot of introduction-level tutorials.

In this series of blog posts I’ll elaborate on some of the peculiar, interesting problems I’ve encountered during my experience writing code that uses GHC API and also provide various tips I find useful.

I have built for myself a small layer of helper functions that helped me with using GHC API for the interactive-diagrams project. The source can be found on GitHub and I plan on refactoring the code and releasing it separately.

One particular thing I had to do was to add a GHC package database to the GHC API session.

For those familiar with the structure of the interactive-diagrams project: since the workers run in a separate environment, each of them has it’s own chroot jail including each own package database. I had to manually set up a path to package database for each worker so it would pick up the necessary packages.

Package databases

A package database is a directory where the information about your installed packages is stored. For each package registered in the database there is a .conf file with the package details. The .conf file contains the package description (just like in the .cabal file) as well as path to binaries and a list of resolved dependencies:

$ cat aeson-0.6.1.0.1-5a107a6c6642055d7d5f98c65284796a.conf
name: aeson
version: 0.6.1.0.1
id: aeson-0.6.1.0.1-5a107a6c6642055d7d5f98c65284796a

import-dirs: /home/dan/.cabal/lib/aeson-0.6.1.0.1/ghc-7.7.20130722
library-dirs: /home/dan/.cabal/lib/aeson-0.6.1.0.1/ghc-7.7.20130722

depends: attoparsec-0.10.4.0-acffb7126aca47a107cf7722d75f1f5e
         base-4.7.0.0-b67b4d8660168c197a2f385a9347434d
         blaze-builder-0.3.1.1-9fd49ac1608ca25e284a8ac6908d5148
         bytestring-0.10.3.0-66e3f5813c3dc8ef9647156d1743f0ef

You can use ghc-pkg to manage installed packages on your system. For example, to list all the packages you’ve installed run ghc-pkg list. To list all the package databases that are automatically picked up by ghc-pkg do the following:

$ ghc-pkg nonexistentpkg
/home/dan/ghc/lib/ghc-7.7.20130722/package.conf.d
/home/dan/.ghc/i386-linux-7.7.20130722/package.conf.d

See ghc-pkg --help or the online documentation for more details.

Adding a package db

By default GHC knows only about two package databases: the global package database (usually /usr/lib/ghc-something/ on Linux) and the user-specific database (usually ~/.ghc/lib). In order to pick up a package that resides in a different package database you have to employ some tricks.

For some reason GHC API does not export an clear and easy-to-use function that would allow you to do that, although the code we need is present in the GHC sources.

The way this whole thing works is the following:

  1. GHC calls initPackages, which reads the database files and sets up the internal table of package information
  2. The reading of package databases is performed via the readPackageConfigs function. It reads the user package database, the global package database, the “GHC_PACKAGE_PATH” environment variable, and applies the extraPkgConfs function, which is a dynflag and has the following type: extraPkgConfs :: [PkgConfRef] -> [PkgConfRef] (PkgConfRef is a type representing the package database). The extraPkgConf flag is supposed to represent the -package-db command line option.
  3. Once the database is parsed, the loaded packages are stored in the pkgDatabase dynflag which is a list of PackageConfigs

So, in order to add a package database to the current session we have to simply modify the extraPkgConfs dynflag. Actually, there is already a function present in the GHC source that does exactly what we need: addPkgConfRef :: PkgConfRef -> DynP (). Unfortunately it’s not exported so we can’t use it in our own code. I rolled my own functions that I am using in the interactive-diagrams project, feel free to copy them:

-- | Add a package database to the Ghc monad
#if __GLASGOW_HASKELL_ >= 707  
addPkgDb :: GhcMonad m => FilePath -> m ()
#else
addPkgDb :: (MonadIO m, GhcMonad m) => FilePath -> m ()
#endif
addPkgDb fp = do
  dfs <- getSessionDynFlags
  let pkg  = PkgConfFile fp
  let dfs' = dfs { extraPkgConfs = (pkg:) . extraPkgConfs dfs }
  setSessionDynFlags dfs'
#if __GLASGOW_HASKELL_ >= 707    
  _ <- initPackages dfs'
#else
  _ <- liftIO $ initPackages dfs'
#endif
  return ()

-- | Add a list of package databases to the Ghc monad
-- This should be equivalen to  
-- > addPkgDbs ls = mapM_ addPkgDb ls
-- but it is actaully faster, because it does the package
-- reintialization after adding all the databases
#if __GLASGOW_HASKELL_ >= 707      
addPkgDbs :: GhcMonad m => [FilePath] -> m ()
#else
addPkgDbs :: (MonadIO m, GhcMonad m) => [FilePath] -> m ()
#endif             
addPkgDbs fps = do 
  dfs <- getSessionDynFlags
  let pkgs = map PkgConfFile fps
  let dfs' = dfs { extraPkgConfs = (pkgs ++) . extraPkgConfs dfs }
  setSessionDynFlags dfs'
#if __GLASGOW_HASKELL_ >= 707    
  _ <- initPackages dfs'
#else
  _ <- liftIO $ initPackages dfs'
#endif       
  return ()
  • Packages module, contains other functions that modify/make use of extraPkgConfs

Outro

This was the second post in the series and we have seen how to add a package database to the GHC session. Stay tuned for more brief posts and updates.

On custom error handlers for the GHC API

Intro

It’s hard to get into writing code that uses GHC API. It’s huge there are so many options around and not a lot of introduction-level tutorials.

In this series of blog posts I’ll elaborate on some of the peculiar, interesting problems I’ve encountered during my experience writing code that uses GHC API and also provide various tips I find useful.

I have built for myself a small layer of helper functions that helped me with using GHC API for the interactive-diagrams project. The source can be found on GitHub and I plan on refactoring the code and releasing it separately.

Error handling

Today I would like to talk about setting your own error handlers for GHC API. By default you can expect GHC to spew all the errors onto your screen, but for my purposes I wanted to log them.

Naturally at first I tried the following:

I am in need of setting up custom exception handlers when using GHC API to compile modules. Right now I have the following piece of code:

-- Main.hs:
import GHC
import GHC.Paths
import MonadUtils
import Exception
import Panic
import Unsafe.Coerce
import System.IO.Unsafe

-- I thought this code would handle the exception
handleException :: (ExceptionMonad m, MonadIO m)
                   => m a -> m (Either String a)
handleException m =
  ghandle (\(ex :: SomeException) -> return (Left (show ex))) $
  handleGhcException (\ge -> return (Left (showGhcException ge ""))) $
  flip gfinally (liftIO restoreHandlers) $
  m >>= return . Right

-- Initializations, needed if you want to compile code on the fly
initGhc :: Ghc ()
initGhc = do
  dfs <- getSessionDynFlags
  setSessionDynFlags $ dfs { hscTarget = HscInterpreted
                           , ghcLink = LinkInMemory }
  return ()

-- main entry point
main = test >>= print

test :: IO (Either String Int)
test = handleException $ runGhc (Just libdir) $ do
  initGhc
  setTargets =<< sequence [ guessTarget "file1.hs" Nothing ]
  graph <- depanal [] False
  loaded <- load LoadAllTargets
  -- when (failed loaded) $ throw LoadingException
  setContext (map (IIModule . moduleName . ms_mod) graph)
  let expr = "run"
  res <- unsafePerformIO . unsafeCoerce <$> compileExpr expr
  return res

-- file1.hs:
module Main where

main = return ()

run :: IO Int
run = do
  n <- x
  return (n+1)

The problem is when I run the ‘test’ function above I receive the following output:

h> test

test/file1.hs:4:10: Not in scope: `x'

Left "Cannot add module Main to context: not a home module"
it :: Either String Int

What the ..? My exception handler did catch the error, but:

  1. A strange one
  2. The error I intended to catch got

Is there a way to fix this?

Solution

I even asked this problem on the Haskell-Cafe mailing list, but the folks there don’t seem to be very keen on GHC/GHC API (which is understandable) and I haven’t got any answers.

But thanks to my mentor Luite Stegeman we’ve found the solution.

Errors are handled using the LogAction specified in the DynFlags for your GHC session. So to fix this you need to change ‘log_action’ parameter in dynFlags. For example, you can do this:

initGhc = do
  ..
  ref <- liftIO $ newIORef ""
  dfs <- getSessionDynFlags
  setSessionDynFlags $ dfs { hscTarget  = HscInterpreted
                           , ghcLink    = LinkInMemory
                           , log_action = logHandler ref -- ^ this
                           }

-- LogAction == DynFlags -> Severity -> SrcSpan -> PprStyle -> MsgDoc -> IO ()
logHandler :: IORef String -> LogAction
logHandler ref dflags severity srcSpan style msg =
  case severity of
     SevError ->  modifyIORef' ref (++ printDoc)
     SevFatal ->  modifyIORef' ref (++ printDoc)
     _        ->  return () -- ignore the rest
  where cntx = initSDocContext dflags style
        locMsg = mkLocMessage severity srcSpan msg
        printDoc = show (runSDoc locMsg cntx)

Outro

That’s the first tip and the first post in the series, stay tuned for more updates.