Reimplementing graphmod as a source plugin: graphmod-plugin
Posted on August 9, 2018
You may have heard about source plugins by now. They allow you to modify and inspect the compiler’s intermediate representation. This is useful for extending GHC and performing static analysis of Haskell programs.
In order to test them out, I reimplemented the graphmod
tool as a source plugin. graphmod
generates a graph of the module structure of your package. Reimplementing it as a source plugin makes the implementation more robust. I implemented it as as a type checker plugin which runs after type checking has finished. The result: graphmod-plugin
Architecture
The plugin runs once at the end of type checking for each module. Therefore, if we want to collate information about multiple modules, we must first serialise the information we want and then once all the modules have finished compiling collect all serialised files and process the information.
We will therefore first define a plugin which extracts all the import information from one module before defining the suitable executable which collects all the import information and produces the final output graph.
graphmod-plugin
consists of a library which exports the plugin and an executable which is then invoked to render the information. Here is how to directly use the two in tandem:
# Run the plugin on the source file
ghc -fplugin=GraphMod -fplugin-opt:GraphMod:/output/dir
# Collect the information which was produced
graphmod-plugin --indir /output/dir > modules.dot
Once the dot file has been generated, you can use the normal graphviz utilities to render the file.
tred
removes transitive edges from the graph before we render the graph as a pdf.
The plugin
A type checker plugin is a function of the following type:
The TcGblEnv
is the output of the type checker, it contains all the type checked bindings in addition to lots of other useful information. We are interested in just the imports, these are located in the tcg_rn_imports
field.
An LImportDecl GhcRn
is a data type which contains information about each import.
-- GHC data types
type LImportDecl pass = Located (ImportDecl pass)
data ImportDecl pass
= ImportDecl {
ideclExt :: XCImportDecl pass,
ideclSourceSrc :: SourceText,
ideclName :: Located ModuleName, -- ^ Module name.
ideclPkgQual :: Maybe StringLiteral, -- ^ Package qualifier.
ideclSource :: Bool, -- ^ True <=> {-\# SOURCE \#-} import
ideclSafe :: Bool, -- ^ True => safe import
ideclQualified :: Bool, -- ^ True => qualified
ideclImplicit :: Bool, -- ^ True => implicit import (of Prel
ude)
ideclAs :: Maybe (Located ModuleName), -- ^ as Module
ideclHiding :: Maybe (Bool, Located [LIE pass])
-- ^ (True => hiding, names)
}
Along with the module name, there is lots of meta information about other aspects of the import such as whether it was qualified and so on. Our plugin will take this information and convert it into the format expected by the existing graphmod
library.
# Graphmod data types
data Import = Import { impMod :: ModName, impType :: ImpType }
data ImpType = NormalImp | SourceImp
The graphmod
Import
data type is a simplified version of ImportDecl
. It’s straightforward to extract the information we need. Notice how much simpler this approach is than the approach taken in the original library which uses a lexer to try to identify textually the position of the imports.
convertImport :: ImportDecl GhcRn -> GraphMod.Import
convertImport (ImportDecl{..}) =
GraphMod.Import { impMod = convertModName (ideclName)
, impType = if ideclSource
then GraphMod.SourceImp
else GraphMod.NormalImp
}
convertModName :: Located ModuleName -> GraphMod.ModName
convertModName (L _ mn) = GraphMod.splitModName (moduleNameString mn)
Notice that it is also possible to extend the GraphMod.Import
data type to contain new information easily. In the previous implementation this would be much more effort as the lexing approach is fragile.
Serialisation
Once we have gathered this information we need to serialise it and write it to disk so that once we have compiled all the modules we can deserialise it and render the final graph.
As we are using GHC, we can use the same serialisation machinery as GHC uses to write interface files. Of course, you are free to use whatever serialisation library you like but there are already instances defined for GHC specific types. We won’t need any of them in this example but they can be useful. The writeBinary
function takes a value serialisable by the GHC.Binary
class and writes it to the file.
import Binary
initBinMemSize :: Int
initBinMemSize = 1024 * 1024
writeBinary :: Binary a => FilePath -> a -> IO ()
writeBinary path payload = do
bh <- openBinMem initBinMemSize
put_ bh payload
writeBinMem bh path
We also needed to write some simple Binary
instances by hand in order to do the serialisation.
instance Binary GraphMod.Import where
put_ bh (GraphMod.Import mn ip) = put_ bh mn >> put_ bh ip
get bh = GraphMod.Import <$> get bh <*> get bh
instance Binary GraphMod.ImpType where
put_ bh c =
case c of
GraphMod.NormalImp -> putByte bh 0
GraphMod.SourceImp -> putByte bh 1
get bh = getByte bh >>= return . \case
0 -> GraphMod.NormalImp
1 -> GraphMod.SourceImp
_ -> error "Binary:GraphMod"
Plugin Description
Once we have these parts, we can assemble them into the final plugin. We first get the imports out of tcg_rn_imports
and then convert them using convertImport
. We then write this information to a uniquely named file in the output directory which is passed as an argument to the plugin.
-- The main plugin function, it collects and serialises the import
-- information for a module.
install :: [CommandLineOption] -> ModSummary -> TcGblEnv -> TcM TcGblEnv
install opts ms tc_gbl = do
let imps = tcg_rn_imports tc_gbl
gm_imps = map (convertImport . unLoc) imps
outdir = mkOutdir opts
path = mkPath outdir (ms_mod ms)
gm_modname = getModName ms
liftIO $ do
createDirectoryIfMissing False outdir
writeBinary path (gm_modname, gm_imps)
return tc_gbl
mkPath
tries to come up with a unique name for a module by using the moduleUnitId
. The file name doesn’t matter particularly as long as it’s unique. We could instead write this information to a database or to a file handle. Writing to disk is just a convenient method of serialisation.
mkPath :: FilePath -> Module -> FilePath
mkPath fp m
= fp </> (moduleNameString (moduleName m) ++ (show (moduleUnitId m)))
Then, we define the plugin by making a definition called plugin
and overriding the typeCheckResultAction
field and the pluginRecompile
field. purePlugin
means that the result of our plugin only depends on the contents of the source file rather than any external information. This means that we don’t need to recompile the module every time just because we are using a plugin.
-- Installing the plugin
plugin :: Plugin
plugin = defaultPlugin {
typeCheckResultAction = install
, pluginRecompile = purePlugin
}
Now that our module exports an identifier of type Plugin
called plugin
we are finished defining the plugin part of the project.
The finaliser
Once all the modules have finished compiling. They will have written their information to a file in a certain directory that we can now inspect to create the dot graph.
We define an executable to do this. The executable takes the directory of the files as an argument, reads all the files and then processes them to produce the graph.
In the collectImports
function, we first read the directory from a command line argument. Then we find all the files in this directory and read their contents into memory. We use the helper function readImports
which uses functions from the Binary
module to read the serialised files. Finally, we build the graph using all the import information and then pass the graph we have built to the existing graphmod
backend.
collectImports :: IO ()
collectImports = do
raw_opts <- getArgs
let (fs, _ms, _errs) = getOpt Permute options raw_opts
opts = foldr ($) default_opts fs
outdir = inputDir opts
files <- listDirectory outdir
usages <- mapM (readImports outdir) files
let graph = buildGraph opts usages
putStr (GraphMod.make_dot opts graph)
readImports :: FilePath -> FilePath -> IO Payload
readImports outdir fp = do
readBinMem (outdir </> fp) >>= get
The buildGraph
function builds an in memory representation of the module graph. There is a node for each module and an edge between modules if one imports the other. We finally mimic the original graphmod
tool and output the representation of the graph on stdout
. This can then be piped to dot
in order to render the graph.
Running the plugin with nix
By far the most convenient way to run the plugin is with nix. This gets around the problem of having to run the finaliser after compiling the plugin. We use the haskell-nix-plugin
infrastructure in order to do this.
The information required to run the plugin consists of information about the plugin package but also an additional, optional, final phase which runs after the module has finished compiling.
graphmod =
{ pluginPackage = hp.graphmod-plugin;
pluginName = "GraphMod";
pluginOpts = (out-path: ["${out-path}/output"]);
pluginDepends = [ nixpkgs.graphviz ];
finalPhase = out-path: ''
graphmod-plugin --indir ${out-path}/output > ${out-path}/out.dot
cat ${out-path}/out.dot | tred | dot -Gdpi=600 -Tpng > ${out-path}/modules.png
''; } ;
I will add this definition to the plugins.nix
file in haskell-nix-plugin
once ghc-8.6.1 is released.
We then would use the addPlugin
function in order to run the plugin on a package. In order to get the module graph we inspect the GraphMod
output.
Running this script on aeson
produces this quite large image which shows the whole module graph.
A complete example default.nix
can be found in the repo.
Conclusion
We have described one way in which one can structure a plugin. There are probably other ways but this seems ergnomic and convenient. Hopefully others will find this quite detailed summary and reference code useful to build upon.
Writing plugins is quite similar to modifying GHC itself so if you need help, the best place to ask is either on the ghc-devs
mailing list or on #ghc on freenode.