Return to Revisions

4 of 7

deleted 1809 characters in body; edited title

edited Aug 1, 2015 at 2:01

145.6k
22
191
481

Parsing and plotting complex numbers

Task

I am trying to write a Haskell application that parses a bunch of complex numbers from a Mathematica output and then produces a PPM image. The end product looks like this:

[![Plotted data][1]][1] [1]: https://i.sstatic.net/Zg2es.jpg

Code

The code to create this is as follows:

module Main where

import PPM
import System.IO
import Control.Applicative as A
import Control.Monad.ST
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as M
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as LB
import qualified Data.Attoparsec.ByteString.Char8 as P
import qualified Data.Attoparsec.ByteString.Lazy as LP


apply :: a -> (a -> b) -> b
apply a = \f -> f a


---------- PARSING ----------

iT = B.pack "*I"

data Complex = Complex Double Double
    deriving (Show)

cAdd:: Complex -> [Complex] -> Complex
cAdd (Complex a b) [(Complex c d)] = Complex (a + c) (b + d)
cAdd c _ = c
cConv :: Complex -> Complex
cConv (Complex a b) = Complex b a

parseTerm :: P.Parser Complex
parseTerm = parseDouble <*> parseConv
    where 
        parseDouble = liftA (\x -> apply (Complex x 0)) $
            liftA2 (\x y -> read ((x:y) ++ "0")) P.anyChar $ many $ P.satisfy $ \x -> P.isDigit x || x == '.'
        parseConv = P.option id $ liftA (const cConv) $ P.string iT

test :: P.Parser String
test = liftA2 (\x y -> (x:y)) P.anyChar $ many $ P.satisfy $ \x -> P.isDigit x || x == '.'

parseComplex :: P.Parser Complex
parseComplex = liftA2 cAdd parseTerm $ many $ many (P.char '+') A.*> parseTerm


---------- IMAGE PROCESSING ----------

bounds = 2.0 :: Double
imgSize = 400 :: Int

intensity = 300.0 :: Double
falloff = 0.8 :: Double
(r, g, b) = (255.0, 76.5, 25.5) :: (Double, Double, Double)

computeIndex :: Double -> Int
computeIndex x = (+) 1 $ floor $ (x + bounds) / (2 * bounds / fromIntegral imgSize)

convCoord ::(Int, Int) -> Int
convCoord (h, v) = v * imgSize + h

computeCIndex :: Complex -> Int
computeCIndex (Complex r i) = convCoord (computeIndex r, computeIndex i)

genVec :: [Complex] -> V.Vector Int
genVec xs = runST $ do
    mv <- M.replicate (imgSize ^ 2) 0
    mapM_ (incr mv) $ map computeCIndex xs -- unsafeModify
    V.freeze mv
        where
            incr mv i = do
                pre <- M.unsafeRead mv i
                M.unsafeWrite mv i $ pre + 1

colorFunction :: Double -> (Int, Int, Int)
colorFunction d = (floor (y * r), floor(y * g), floor(y * b))
    where y = (intensity * d) ** falloff

colorFunction1 n = (n, n, n)

genImage :: V.Vector Int -> PPM
genImage v = PPM (V.map (\x -> colorFunction((fromIntegral x) / (fromIntegral mx))) v) imgSize imgSize 255
    where mx = V.maximum v

extr :: LP.Result a -> a
extr (LP.Done _ r) = r
extr (LP.Fail _ _ y) = error y

---------- MAIN ----------

main = do
    rawData <- liftA LB.words (LB.readFile "/mnt/hgfs/outputs/out_1-14.txt")
    let formatedData = map (extr.LP.parse parseComplex) rawData

    h <- openFile "test.ppm" WriteMode
    writeImage (genImage (genVec formatedData)) h
    hClose h

Parsing is done with completely with Applicative Attoparsec parsers on Lazy Bytestrings. Input files are expected to be roughly 3Gbs-8Gbs in size, containing anywhere from 70-600 million roots. Each complex number is separated by a newline and looks something like this:

1.9238394+0.32423423*I

Mathematica outputs are a bit funky, retaining a decimal point, for whole numbers:

1.+123123125123*I

After parsing, I generate the PPM, represented by a mutable vector by first computing the index of the vector that corresponds to each Complex number, then mapping a Monad action over each of these indices. The end result is I have a mutable vector that kind of resembles a sort of histogram of points. Finally, the function colorFunction assigns a color to each point and I write the image to a file.

Questions

This program takes in a lot of data so I would like it to perform very well. However, I'm not experienced enough to really know what kind of experience I should expect. Is this program structured in a good way? Am I using the right data structures and functions? Are there any small refinements I can make?
Since data structures here are lazy, I'm a little bit unsure about how I can do benchmarks on this program. For instance, if I want to just measure the performance of the parser, how can I force ghc to evaluate that portion of the code without generating an image or doing a bunch of IO?
I have access to the Research Computing Center at the University of Chicago. I'm interested in whether I would be able to get better performance out of this if the code could be run in parallel. If so, where would I get started?

asked Aug 1, 2015 at 1:58

Frank Wang