What are optimization technique in spark or what optimization you have done during your spark project .

Đặng Minh Hiếu

Code: Whatever

2021-01-17 22:29:51

filtered_df = filter_input_data(intial_data)filter_df.persist()for obj in list_objects:    compute_df = compute_dataframe(input_df,obj)    percentage_df =                                 calculate_percentage(compute_df)export_as_csv(percentage_df)filter_df.unpersist()

Jetpack

Code: Whatever

2021-06-23 00:29:00

>>>df = spark.createDataFrame(    [('1', 'true'),('2', 'false'),     ('1', 'true'),('2', 'false'),    ('1', 'true'),('2', 'false'),    ('1', 'true'),('2', 'false'),    ('1', 'true'),('2', 'false'),    ])>>> df.rdd.getNumPartitions()8#Now performing a group by Operation>>> group_df = df.groupBy("_1").count()>>> group_df.show()+---+-----+| _1|count|+---+-----+|  1|    5||  2|    5|+---+-----+>>> group_df.rdd.getNumPartitions()200

New to Communities?

Join the community

What are optimization technique in spark or what optimization you have done during your spark project .

Tags

Related

New to Communities?