Task
I am trying to write a Haskell application that parses a bunch of complex numbers from a Mathematica output and then produces a PPM image. The end product looks like this:
[![Plotted data][1]][1] [1]: https://i.sstatic.net/Zg2es.jpg
Code
The code to create this is as follows:
module Main where
import PPM
import System.IO
import Control.Applicative as A
import Control.Monad.ST
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as M
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as LB
import qualified Data.Attoparsec.ByteString.Char8 as P
import qualified Data.Attoparsec.ByteString.Lazy as LP
apply :: a -> (a -> b) -> b
apply a = \f -> f a
---------- PARSING ----------
iT = B.pack "*I"
data Complex = Complex Double Double
deriving (Show)
cAdd:: Complex -> [Complex] -> Complex
cAdd (Complex a b) [(Complex c d)] = Complex (a + c) (b + d)
cAdd c _ = c
cConv :: Complex -> Complex
cConv (Complex a b) = Complex b a
parseTerm :: P.Parser Complex
parseTerm = parseDouble <*> parseConv
where
parseDouble = liftA (\x -> apply (Complex x 0)) $
liftA2 (\x y -> read ((x:y) ++ "0")) P.anyChar $ many $ P.satisfy $ \x -> P.isDigit x || x == '.'
parseConv = P.option id $ liftA (const cConv) $ P.string iT
test :: P.Parser String
test = liftA2 (\x y -> (x:y)) P.anyChar $ many $ P.satisfy $ \x -> P.isDigit x || x == '.'
parseComplex :: P.Parser Complex
parseComplex = liftA2 cAdd parseTerm $ many $ many (P.char '+') A.*> parseTerm
---------- IMAGE PROCESSING ----------
bounds = 2.0 :: Double
imgSize = 400 :: Int
intensity = 300.0 :: Double
falloff = 0.8 :: Double
(r, g, b) = (255.0, 76.5, 25.5) :: (Double, Double, Double)
computeIndex :: Double -> Int
computeIndex x = (+) 1 $ floor $ (x + bounds) / (2 * bounds / fromIntegral imgSize)
convCoord ::(Int, Int) -> Int
convCoord (h, v) = v * imgSize + h
computeCIndex :: Complex -> Int
computeCIndex (Complex r i) = convCoord (computeIndex r, computeIndex i)
genVec :: [Complex] -> V.Vector Int
genVec xs = runST $ do
mv <- M.replicate (imgSize ^ 2) 0
mapM_ (incr mv) $ map computeCIndex xs -- unsafeModify
V.freeze mv
where
incr mv i = do
pre <- M.unsafeRead mv i
M.unsafeWrite mv i $ pre + 1
colorFunction :: Double -> (Int, Int, Int)
colorFunction d = (floor (y * r), floor(y * g), floor(y * b))
where y = (intensity * d) ** falloff
colorFunction1 n = (n, n, n)
genImage :: V.Vector Int -> PPM
genImage v = PPM (V.map (\x -> colorFunction((fromIntegral x) / (fromIntegral mx))) v) imgSize imgSize 255
where mx = V.maximum v
extr :: LP.Result a -> a
extr (LP.Done _ r) = r
extr (LP.Fail _ _ y) = error y
---------- MAIN ----------
main = do
rawData <- liftA LB.words (LB.readFile "/mnt/hgfs/outputs/out_1-14.txt")
let formatedData = map (extr.LP.parse parseComplex) rawData
h <- openFile "test.ppm" WriteMode
writeImage (genImage (genVec formatedData)) h
hClose h
Parsing is done with completely with Applicative Attoparsec parsers on Lazy Bytestrings. Input files are expected to be roughly 3Gbs-8Gbs in size, containing anywhere from 70-600 million roots. Each complex number is separated by a newline and looks something like this:
1.9238394+0.32423423*I
Mathematica outputs are a bit funky, retaining a decimal point, for whole numbers:
1.+123123125123*I
After parsing, I generate the PPM, represented by a mutable vector by first computing the index of the vector that corresponds to each Complex number, then mapping a Monad action over each of these indices. The end result is I have a mutable vector that kind of resembles a sort of histogram of points. Finally, the function colorFunction assigns a color to each point and I write the image to a file.
Questions
- This program takes in a lot of data so I would like it to perform very well. However, I'm not experienced enough to really know what kind of experience I should expect. Is this program structured in a good way? Am I using the right data structures and functions? Are there any small refinements I can make?
- Since data structures here are lazy, I'm a little bit unsure about how I can do benchmarks on this program. For instance, if I want to just measure the performance of the parser, how can I force ghc to evaluate that portion of the code without generating an image or doing a bunch of IO?
- I have access to the Research Computing Center at the University of Chicago. I'm interested in whether I would be able to get better performance out of this if the code could be run in parallel. If so, where would I get started?