According to this topic I'll try to describe my proposition:
What I understand is that we have got a dataframe of dates and thousands of companies. Here's our example dataframe called prices:
> prices
newdates nsp1 nsp2 nsp3 nsp4
1 2000-01-03 NA NA NA NA
2 2000-01-04 79.5 325.0 NA 961
3 2000-01-05 79.5 322.5 NA 945
4 2000-01-06 79.5 327.5 NA 952
5 2000-01-07 NA 327.5 NA 941
6 2000-01-10 79.5 327.5 NA 946
7 2000-01-11 79.5 327.5 NA 888
To create a new dataframe of log-returns I used below code:
logs=data.frame(
+ cbind.data.frame(
+ newdates[-1],
+ diff(as.matrix(log(prices[,-1])))
+ )
+ )
> logs
newdates..1. nsp1 nsp2 nsp3 nsp4
1 2000-01-04 NA NA NA NA
2 2000-01-05 0 -0.007722046 NA -0.016789481
3 2000-01-06 0 0.015384919 NA 0.007380107
4 2000-01-07 NA 0.000000000 NA -0.011621895
5 2000-01-10 NA 0.000000000 NA 0.005299429
6 2000-01-11 0 0.000000000 NA -0.063270826
To clarify what is going on in this code lets analyze it from the inside out:
Step 1: Calculating log-returns
- You know that
log(a/b) = log(a)-log(b), so we can calculate
differences of logarithms. Funcition diff(x,lag=1) calculates
differences with given lag. Here it is lag=1 so it gives first
differences.
- Our
x are prices in dataframe. Do pick from a
data.frame every columns without the first (there are dates) we use
prices[,-1].
- We need logarithms, so
log(prices[,-1])
- Function
diff() works with vector or matrix, so we need to treat
calculated logarithms as matrix, thus
`as.matrix(log(prices[,-1]))
- Now we can use
diff() with lag=1, so diff(as.matrix(log(prices[,-1])))
Step 2: Creating dataframe of log-returns and dates
We can't use just cbind(). Firstly, because lengths are different (returns are shorter by 1 record). We need to remove first date, so newdates[-1]
Secondly, using cbind() dates will be transformed into numeric values such 160027 or other.
Here we have to use cbind.data.frame(x,y), as seen above.
Now data is ready and we can create use a data.frame() and name it as logs so logs=data.frame(...) as above.
If your dataset look like dataframe prices it should run. Most important thing is to use diff(log(x)) to easily calculate log-returns.
If you have any questions or problem, then just ask.