1

I am trying to write a custom dataframe object- CustomFrame that inherits from the DataFrame class of pyspark. This is how it looks like

from pyspark.sql import DataFrame

class CustomFrame(DataFrame):
        def __init__(self, spark_df, is_vector=False):
            self.is_vector = is_vector

df = CustomFrame(spark.createDataFrame([[1,2],[3,4]], ['a', 'b']))

But when I run the following, df.show(1)

I get a recursion error

RecursionError: maximum recursion depth exceeded

I am not sure what's causing this. I haven't really made any changes to its behavior. Any idea why I am getting this error?

Clock Slave
  • 7,627
  • 15
  • 68
  • 109
  • I guess the 'DataFrame' part of your CustomFrame isn't initiallized properly. You don't actually have a DataFrame currently, you only have a wrapper class. Why not just set an attribute `is_vector` on the regular dataframe? You can use getattr to get the result later (to avoid exceptions). – Hitobat Jun 15 '18 at 05:23
  • @Hitobat, I didnt quite get what you are saying by `...the 'DataFrame' part of your CustomFrame isn't initiallized properly`. Can you elaborate? – Clock Slave Jun 15 '18 at 05:55
  • 3
    This might help you, https://stackoverflow.com/questions/41598383/is-it-possible-to-subclass-dataframe-in-pyspark – Suresh Jun 15 '18 at 08:11
  • Good link Suresh. – Hitobat Jun 15 '18 at 10:49

0 Answers0