4

Now the string is looks like:

"Interest.USD,Vol=[Integrated,(0,0.101),(0.2,0.108),(1,0.110),(2,0.106),
(3,0.102),(4,0.09),(5,0.091),(6,0.09128272)],Drift=[Integrated,(0.002,0.09),
(0.24,0.0007),(0.4,0.007),(1,-0.033),(2,-0.005),(3,-0.0041),
(4,-0.3505),(5,-0.65),(7,-0.08346),(8,-0.049),(9,-0.0613),(10,-0.019)],
Risk_Neutral=YES,Lambda=0.09,FX_Volatility=0.01,FX_Correlation=0.9"

I want to grab the data following the "Vol" and "Drift" in a matrix format like:

Vol matrix:

0,0.101
0.2,0.108
1,0.110
2,0.106
3,0.102
4,0.09
5,0.091
6,0.09128272

and also the single value like 0.09 for Lambda. I guess I shuold use regular expression, but I not that familiar with that. Any suggestion? :)

P.S. I tried using:

str_extract_all(text,'[ .+? ]')

try to get the data bewteen [ and ], but it returns "."

5
  • You should use regular expression. Have you tried learning how to use them? Commented Jun 19, 2014 at 14:25
  • @SeñorO hi, thank you for comment. I edited my question with the way I have tried. Any suggestion for the code is welcome :) Commented Jun 19, 2014 at 14:33
  • why you didn't mention 2,0.106 in your output? Commented Jun 19, 2014 at 14:37
  • is there any newline exists in your input string? Commented Jun 19, 2014 at 14:40
  • @AvinashRaj Sorry..It was a mistake.. No new line exists in my input Commented Jun 19, 2014 at 14:46

2 Answers 2

5

Here's a way to extract those values in R. Let's assume that strings you posted is stored in a variable named a. In order to make things easier, i'm going to use a helper function: getcapturedmatches(). Then you can do

expr <- "(Vol|Drift)=\\[Integrated,([^\\]]*)\\]"
mm <- regcapturedmatches(a,gregexpr(expr,a, perl=T))[[1]]
expr <- "\\(([^,]+),([^,]+)\\)"
vv <- regcapturedmatches(mm[,2],gregexpr(expr,mm[,2], perl=T))

First we do a pass to extract the Vol and Drift elements in mm and then we split the comma delimited lists into vv. Now we can combine the data into one large data.frame

tt <- Map(data.frame, col=mm[,1], val=lapply(vv, 
    function(x) {class(x)<-"numeric"; x}))
dd<-do.call(rbind, unname(tt))

In the end dd will look like

     col  val.1       val.2
1    Vol  0.000  0.10100000
2    Vol  0.200  0.10800000
3    Vol  1.000  0.11000000
4    Vol  2.000  0.10600000
5    Vol  3.000  0.10200000
6    Vol  4.000  0.09000000
7    Vol  5.000  0.09100000
8    Vol  6.000  0.09128272
9  Drift  0.002  0.09000000
10 Drift  0.240  0.00070000
11 Drift  0.400  0.00700000
12 Drift  1.000 -0.03300000
13 Drift  2.000 -0.00500000
14 Drift  3.000 -0.00410000
15 Drift  4.000 -0.35050000
16 Drift  5.000 -0.65000000
17 Drift  7.000 -0.08346000
18 Drift  8.000 -0.04900000
19 Drift  9.000 -0.06130000
20 Drift 10.000 -0.01900000

This method allows for any number of repeated values in each of those sections.

If you did just want simple matrices then

Map(function(a,b) {class(b)<-"numeric"; b}, mm[,1], 
    lapply(vv, function(x) {class(x)<-"numeric"; x}))

will give you a named list of the matrices.

Sign up to request clarification or add additional context in comments.

1 Comment

@Louisyan if you find this answer as a good one then don't forget to accept it.
2

You could try this regex. The value inside brackets are stored into seperate groups and the stored groups are again referenced through backreference.

Vol=.*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*\(([\d,.]+)\).*(?=,Drift)

DEMO

See the stored group on the right-side.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.