All About Oracle (SQL, PL/SQL , Forms and Reports) & UNIX: PySpark Spark SQL

Sunday, December 10, 2023

PySpark Spark SQL

from pyspark.sql import SparkSession


# Create a Spark session
spark = SparkSession.builder.appName("example").getOrCreate()

# Create a DataFrame
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)
#df.show()
+-------+---+
|   Name|Age|
+-------+---+
|  Alice| 25|
|    Bob| 30|
|Charlie| 35|
+-------+---+
# Register the DataFrame as a temporary SQL table
df.createOrReplaceTempView("people")

# Use Spark SQL to query the table
result = spark.sql("SELECT * FROM people WHERE Age >= 30")

# Show the result
result.show()

|   Name|Age|
+-------+---+
|    Bob| 30|
|Charlie| 35|
+-------+---+

In this example:
We create a Spark session.
Create a DataFrame from a list of tuples.
Register the DataFrame as a temporary SQL table named "people".
Use spark.sql() to execute a SQL query on the "people" table.
Show the result.
This is a simple example, but spark.sql allows you to perform complex SQL queries on large-scale distributed data. Make sure to adjust the configuration and settings based on your Spark cluster setup.

All About Oracle (SQL, PL/SQL , Forms and Reports) & UNIX

Sunday, December 10, 2023

PySpark Spark SQL

No comments:

Post a Comment