I want to declare a function to get the cogroup of two RDD. Actually it's a interSectionByKey. The code below can't be compiled:
def getRetain[K, V](activeUserRdd : RDD[(K, V)], newUserRdd : RDD[(K, V)]): RDD[(K, V)] ={
activeUserRdd.cogroup(newUserRdd).flatMapValues{
x => Option((if (!x._1.isEmpty && !x._2.isEmpty) x._2.head else null).asInstanceOf[V])
}
}
Error:
value cogroup is not a member of org.apache.spark.rdd.RDD[(K, V)]
I think (K, V) miss matched the real [(K, V)] declared in cogroup, but which is the right way to declare in my function?