
Spark 문법
·
Spark
1. 새로운 컬럼추가 .withColumn("변수명", 값) flights_2 = flights_2.withColumn("duration_hrs", flights_2.air_time/60) flights_2.show(1) alias로 지정 # avg_speed avg_speed = (flights_2.distance/(flights_2.air_time/60)).alias("avg_speed") speed_df = flights.select("origin", "dest", "tailnum", avg_speed) speed_df.show() 2. 데이터 필터링 .filter(조건) result = flights_2.filter("distance >= 1000") result.show(1) result2 = ..