from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("example").getOrCreate()
# Create a DataFrame
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)# Register the DataFrame as a temporary SQL table#df.show()
+-------+---+ | Name|Age| +-------+---+ | Alice| 25| | Bob| 30| |Charlie| 35| +-------+---+
df.createOrReplaceTempView("people")
# Use Spark SQL to query the table
result = spark.sql("SELECT * FROM people WHERE Age >= 30")
# Show the result
result.show()
| Name|Age|
+-------+---+
| Bob| 30|
|Charlie| 35|
+-------+---+
In this example:
- We create a Spark session.
- Create a DataFrame from a list of tuples.
- Register the DataFrame as a temporary SQL table named "people".
- Use
spark.sql()
to execute a SQL query on the "people" table. - Show the result.
This is a simple example, but spark.sql
allows you to perform complex SQL queries on large-scale distributed data. Make sure to adjust the configuration and settings based on your Spark cluster setup.
No comments:
Post a Comment