SparkToPandas package
Submodules
SparkToPandas.SparkToPandas module
SparkToPandas Documentation
SparkToPandas is a simple plugin alongside of spark, the SparkToPandas was designed to work with pyspark with a syntax more similar to pandas.
- class SparkToPandas.SparkToPandas.Spark_pandas(spark)[source]
Bases:
object
A supporting functions for pyspark ,which has the syntax similar to pandas
- barChart(df, x, y, hue, title, aspect='horizontal')[source]
Plots a barchart using the seaborn module
- Parameters
df – dataframe
x – str
y – str
hue – str
title – str
aspect – str
- Returns
None
- column_creator(df, primary_column, new_column_name, user_func)[source]
Creates a new column based on user defined function and returns the new rdd
- Parameters
df – dataframe
primary_column – str
new_column_name – str
user_func – function
- Returns
dataframe
- drop_na(df, col_name=None)[source]
Drops null values based on user choice. Supports dropping all null values or dropping null values based on column subset
- Parameters
df – dataframe
col_name – str
- Returns
dataframe
- fillna(df, value, col_name=None)[source]
Fills null values based on user choice.
- Parameters
df – dataframe
value – int/str/float
col_name – str
- Returns
dataframe
- head(df, n)[source]
Prints the head and tail of the dataframe depending on user’s choice.
- Parameters
df – dataframe
n – int
- Returns
None
- read_csv(file_location, header=True)[source]
Function to read csv file as a spark rdd
- Parameters
file_location – str
header – bool
- Returns
rdd