Extending pyspark's DataFrame class

Asked Jun 15 '18 at 04:47

Active Jun 15 '18 at 04:52

Viewed 681 times

I am trying to write a custom dataframe object- CustomFrame that inherits from the DataFrame class of pyspark. This is how it looks like

from pyspark.sql import DataFrame

class CustomFrame(DataFrame):
        def __init__(self, spark_df, is_vector=False):
            self.is_vector = is_vector

df = CustomFrame(spark.createDataFrame([[1,2],[3,4]], ['a', 'b']))

But when I run the following, df.show(1)

I get a recursion error

RecursionError: maximum recursion depth exceeded

I am not sure what's causing this. I haven't really made any changes to its behavior. Any idea why I am getting this error?

edited Jun 15 '18 at 04:52

asked Jun 15 '18 at 04:47

Clock Slave

7,627
15
68
109

I guess the 'DataFrame' part of your CustomFrame isn't initiallized properly. You don't actually have a DataFrame currently, you only have a wrapper class. Why not just set an attribute `is_vector` on the regular dataframe? You can use getattr to get the result later (to avoid exceptions). – Hitobat Jun 15 '18 at 05:23
@Hitobat, I didnt quite get what you are saying by `...the 'DataFrame' part of your CustomFrame isn't initiallized properly`. Can you elaborate? – Clock Slave Jun 15 '18 at 05:55
3

This might help you, https://stackoverflow.com/questions/41598383/is-it-possible-to-subclass-dataframe-in-pyspark – Suresh Jun 15 '18 at 08:11
Good link Suresh. – Hitobat Jun 15 '18 at 10:49

Extending pyspark's DataFrame class

0 Answers0