Suche senden
Hochladen
Pyspark
•
1 gefällt mir
•
1,000 views
Ajay Ohri
Folgen
PySpark made easy
Weniger lesen
Mehr lesen
Daten & Analysen
Melden
Teilen
Melden
Teilen
1 von 9
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
IoT to the Database: Soldering, Python and a little PL/SQL
IoT to the Database: Soldering, Python and a little PL/SQL
Blaine Carter
JavaOne 2010, Rock Star winning presentation on Fugue and Log4JFugue
JavaOne 2010, Rock Star winning presentation on Fugue and Log4JFugue
Brian Tarbox
Mysql56 replication
Mysql56 replication
Chris Makayal
Neotool (using py2neo from the command line)
Neotool (using py2neo from the command line)
Nigel Small
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
Jaehyeuk Oh
Bare-metal and Virtual Provisioning with Razor
Bare-metal and Virtual Provisioning with Razor
Kristian Reese
Scaling FastAGI Applications with Go
Scaling FastAGI Applications with Go
Digium
GPars in Saga Groovy Study
GPars in Saga Groovy Study
Naoki Rin
Empfohlen
IoT to the Database: Soldering, Python and a little PL/SQL
IoT to the Database: Soldering, Python and a little PL/SQL
Blaine Carter
JavaOne 2010, Rock Star winning presentation on Fugue and Log4JFugue
JavaOne 2010, Rock Star winning presentation on Fugue and Log4JFugue
Brian Tarbox
Mysql56 replication
Mysql56 replication
Chris Makayal
Neotool (using py2neo from the command line)
Neotool (using py2neo from the command line)
Nigel Small
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
Jaehyeuk Oh
Bare-metal and Virtual Provisioning with Razor
Bare-metal and Virtual Provisioning with Razor
Kristian Reese
Scaling FastAGI Applications with Go
Scaling FastAGI Applications with Go
Digium
GPars in Saga Groovy Study
GPars in Saga Groovy Study
Naoki Rin
KCDC - .NET memory management
KCDC - .NET memory management
benemmett
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
Andrew Hutchings
Project 1
Project 1
hassanmerer
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
Chad Cooper
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
Andrii Soldatenko
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
Mahmoud Samir Fayed
Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013
Dan Radez
R sharing 101
R sharing 101
Omnia Safaan
Parallel Computing in R
Parallel Computing in R
mickey24
Openstack installation using rdo multi node
Openstack installation using rdo multi node
Narasimha sreeram
tp smarts_onboarding
tp smarts_onboarding
♛Kumar Aneesh♛
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput
Grant McAlister
Basicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt application
Dinesh Manajipet
Maximal slice problem
Maximal slice problem
mininerej
Use of django at jolt online v3
Use of django at jolt online v3
Jaime Buelta
What is the best full text search engine for Python?
What is the best full text search engine for Python?
Andrii Soldatenko
Spark_Documentation_Template1
Spark_Documentation_Template1
Nagavarunkumar Kolla
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
Mahmoud Samir Fayed
Fun with processes - lightning talk
Fun with processes - lightning talk
Paweł Dawczak
Assignment6
Assignment6
Ryan Gogats
Entity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group Berlin
Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Wooga
Weitere ähnliche Inhalte
Was ist angesagt?
KCDC - .NET memory management
KCDC - .NET memory management
benemmett
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
Andrew Hutchings
Project 1
Project 1
hassanmerer
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
Chad Cooper
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
Andrii Soldatenko
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
Mahmoud Samir Fayed
Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013
Dan Radez
R sharing 101
R sharing 101
Omnia Safaan
Parallel Computing in R
Parallel Computing in R
mickey24
Openstack installation using rdo multi node
Openstack installation using rdo multi node
Narasimha sreeram
tp smarts_onboarding
tp smarts_onboarding
♛Kumar Aneesh♛
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput
Grant McAlister
Basicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt application
Dinesh Manajipet
Maximal slice problem
Maximal slice problem
mininerej
Use of django at jolt online v3
Use of django at jolt online v3
Jaime Buelta
What is the best full text search engine for Python?
What is the best full text search engine for Python?
Andrii Soldatenko
Spark_Documentation_Template1
Spark_Documentation_Template1
Nagavarunkumar Kolla
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
Mahmoud Samir Fayed
Fun with processes - lightning talk
Fun with processes - lightning talk
Paweł Dawczak
Assignment6
Assignment6
Ryan Gogats
Was ist angesagt?
(20)
KCDC - .NET memory management
KCDC - .NET memory management
Drizzle to MySQL, Stress Free Migration
Drizzle to MySQL, Stress Free Migration
Project 1
Project 1
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
Triangle OpenStack meetup 09 2013
Triangle OpenStack meetup 09 2013
R sharing 101
R sharing 101
Parallel Computing in R
Parallel Computing in R
Openstack installation using rdo multi node
Openstack installation using rdo multi node
tp smarts_onboarding
tp smarts_onboarding
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput
Basicsof c make and git for a hello qt application
Basicsof c make and git for a hello qt application
Maximal slice problem
Maximal slice problem
Use of django at jolt online v3
Use of django at jolt online v3
What is the best full text search engine for Python?
What is the best full text search engine for Python?
Spark_Documentation_Template1
Spark_Documentation_Template1
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
Fun with processes - lightning talk
Fun with processes - lightning talk
Assignment6
Assignment6
Ähnlich wie Pyspark
Entity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group Berlin
Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Wooga
Spraykatz installation & basic usage
Spraykatz installation & basic usage
Sylvain Cortes
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Codemotion
Ac cuda c_4
Ac cuda c_4
Josh Wyatt
GoLang & GoatCore
GoLang & GoatCore
Sebastian Pożoga
Czym jest webpack i dlaczego chcesz go używać?
Czym jest webpack i dlaczego chcesz go używać?
Marcin Gajda
C&C Botnet Factory
C&C Botnet Factory
Nullbyte Security Conference
Open stack pike-devstack-tutorial
Open stack pike-devstack-tutorial
Eueung Mulyana
OpenStack API's and WSGI
OpenStack API's and WSGI
Mike Pittaro
How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7
VCP Muthukrishna
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
Open Source Consulting
Arbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenv
Markus Zapke-Gründemann
Basic Linux kernel
Basic Linux kernel
Morteza Nourelahi Alamdari
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Ontico
AtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration training
Steve Smith
Using Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutions
Sander van der Burg
Mojolicious lite
Mojolicious lite
andrefsantos
How to deliver a Python project
How to deliver a Python project
mattjdavidson
Undelete (and more) rows from the binary log
Undelete (and more) rows from the binary log
Frederic Descamps
Ähnlich wie Pyspark
(20)
Entity System Architecture with Unity - Unity User Group Berlin
Entity System Architecture with Unity - Unity User Group Berlin
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid
Spraykatz installation & basic usage
Spraykatz installation & basic usage
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Christian Strappazzon - Presentazione Python Milano - Codemotion Milano 2017
Ac cuda c_4
Ac cuda c_4
GoLang & GoatCore
GoLang & GoatCore
Czym jest webpack i dlaczego chcesz go używać?
Czym jest webpack i dlaczego chcesz go używać?
C&C Botnet Factory
C&C Botnet Factory
Open stack pike-devstack-tutorial
Open stack pike-devstack-tutorial
OpenStack API's and WSGI
OpenStack API's and WSGI
How to Install Configure and Use sysstat utils on RHEL 7
How to Install Configure and Use sysstat utils on RHEL 7
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
Arbeiten mit distribute, pip und virtualenv
Arbeiten mit distribute, pip und virtualenv
Basic Linux kernel
Basic Linux kernel
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
AtlasCamp 2015 Docker continuous integration training
AtlasCamp 2015 Docker continuous integration training
Using Nix and Docker as automated deployment solutions
Using Nix and Docker as automated deployment solutions
Mojolicious lite
Mojolicious lite
How to deliver a Python project
How to deliver a Python project
Undelete (and more) rows from the binary log
Undelete (and more) rows from the binary log
Mehr von Ajay Ohri
Introduction to R ajay Ohri
Introduction to R ajay Ohri
Ajay Ohri
Introduction to R
Introduction to R
Ajay Ohri
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
Ajay Ohri
Download Python for R Users pdf for free
Download Python for R Users pdf for free
Ajay Ohri
Install spark on_windows10
Install spark on_windows10
Ajay Ohri
Ajay ohri Resume
Ajay ohri Resume
Ajay Ohri
Statistics for data scientists
Statistics for data scientists
Ajay Ohri
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
Ajay Ohri
Tools and techniques for data science
Tools and techniques for data science
Ajay Ohri
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri
Training in Analytics and Data Science
Training in Analytics and Data Science
Ajay Ohri
Tradecraft
Tradecraft
Ajay Ohri
Software Testing for Data Scientists
Software Testing for Data Scientists
Ajay Ohri
Craps
Craps
Ajay Ohri
A Data Science Tutorial in Python
A Data Science Tutorial in Python
Ajay Ohri
How does cryptography work? by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
Ajay Ohri
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
Ajay Ohri
Kush stats alpha
Kush stats alpha
Ajay Ohri
Analyze this
Analyze this
Ajay Ohri
Summer school python in spanish
Summer school python in spanish
Ajay Ohri
Mehr von Ajay Ohri
(20)
Introduction to R ajay Ohri
Introduction to R ajay Ohri
Introduction to R
Introduction to R
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
Download Python for R Users pdf for free
Download Python for R Users pdf for free
Install spark on_windows10
Install spark on_windows10
Ajay ohri Resume
Ajay ohri Resume
Statistics for data scientists
Statistics for data scientists
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
Tools and techniques for data science
Tools and techniques for data science
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Training in Analytics and Data Science
Training in Analytics and Data Science
Tradecraft
Tradecraft
Software Testing for Data Scientists
Software Testing for Data Scientists
Craps
Craps
A Data Science Tutorial in Python
A Data Science Tutorial in Python
How does cryptography work? by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
Kush stats alpha
Kush stats alpha
Analyze this
Analyze this
Summer school python in spanish
Summer school python in spanish
Kürzlich hochgeladen
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
dajasot375
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
fhwihughh
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
Pramod Kumar Srivastava
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
e4aez8ss
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
ssuserf63bd7
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
vhwb25kk
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
yuu sss
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
Mike Bennett
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
17djon017
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
natarajan8993
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Eduminds Learning
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
📊 Markus Baersch
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
John Sterrett
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
gstagge
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
GQ Research
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
marianagonzalez07
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
Sapana Sha
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
While-For-loop in python used in college
While-For-loop in python used in college
ssuser7a7cd61
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
yuu sss
Kürzlich hochgeladen
(20)
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
While-For-loop in python used in college
While-For-loop in python used in college
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
Pyspark
1.
```python !pip install pyspark ``` Collecting
pyspark Downloading pyspark-2.2.0.post0.tar.gz (188.3MB) Collecting py4j==0.10.4 (from pyspark) Downloading py4j-0.10.4-py2.py3-none-any.whl (186kB) Building wheels for collected packages: pyspark Running setup.py bdist_wheel for pyspark: started Running setup.py bdist_wheel for pyspark: finished with status 'done' Stored in directory: C:UsersDellAppDataLocalpipCachewheels5f0bb35cb16b15d28dcc32f8e 7ec91a044829642874bb7586f6e6cbe Successfully built pyspark Installing collected packages: py4j, pyspark Successfully installed py4j-0.10.4 pyspark-2.2.0 ```python from pyspark import SparkContext,SparkConf sc=SparkContext() ``` ```python import os ``` ```python os.getcwd() ``` 'C:UsersDell' ```python os.chdir('C:UsersDellDesktop') ``` ```python os.listdir() ```
2.
['desktop.ini', 'dump 2582017', 'Fusion Church.html', 'Fusion
Church_files', 'iris.csv', 'KOG', 'NF22997109906610.ETicket.pdf', 'R Packages', 'Telegram.lnk', 'twitter_share.jpg', 'winutils.exe', '~$avel Reimbursements.docx', '~$thonajay.docx'] ```python #load data data=sc.textFile('C:UsersDellDesktopiris.csv') ``` ```python type(data) ``` pyspark.rdd.RDD ```python data.top(1) ``` ['7.9,3.8,6.4,2,"virginica"'] ```python data.first() ```
3.
'"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"' ```python from pyspark.sql import
SparkSession ``` ```python spark= SparkSession.builder .master("local") .appName("Data Exploration") .getOrCreate() ``` ```python #load data as Spark DataFrame data2=spark.read.format("csv") .option("header","true") .option("mode","DROPMALFORMED") .load('C:UsersDellDesktopiris.csv') ``` ```python type(data2) ``` pyspark.sql.dataframe.DataFrame ```python data2.printSchema() ``` root |-- Sepal.Length: string (nullable = true) |-- Sepal.Width: string (nullable = true) |-- Petal.Length: string (nullable = true) |-- Petal.Width: string (nullable = true) |-- Species: string (nullable = true)
4.
```python data2.columns ``` ['Sepal.Length', 'Sepal.Width', 'Petal.Length',
'Petal.Width', 'Species'] ```python data2.schema.names ``` ['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'] ```python newColumns=['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width', 'Species'] ``` ```python from functools import reduce ``` ```python data2 = reduce(lambda data2, idx: data2.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), data2) data2.printSchema() data2.show() ``` root |-- Sepal_Length: string (nullable = true) |-- Sepal_Width: string (nullable = true) |-- Petal_Length: string (nullable = true)
5.
|-- Petal_Width: string
(nullable = true) |-- Species: string (nullable = true) +------------+-----------+------------+-----------+-------+ |Sepal_Length|Sepal_Width|Petal_Length|Petal_Width|Species| +------------+-----------+------------+-----------+-------+ | 5.1| 3.5| 1.4| 0.2| setosa| | 4.9| 3| 1.4| 0.2| setosa| | 4.7| 3.2| 1.3| 0.2| setosa| | 4.6| 3.1| 1.5| 0.2| setosa| | 5| 3.6| 1.4| 0.2| setosa| | 5.4| 3.9| 1.7| 0.4| setosa| | 4.6| 3.4| 1.4| 0.3| setosa| | 5| 3.4| 1.5| 0.2| setosa| | 4.4| 2.9| 1.4| 0.2| setosa| | 4.9| 3.1| 1.5| 0.1| setosa| | 5.4| 3.7| 1.5| 0.2| setosa| | 4.8| 3.4| 1.6| 0.2| setosa| | 4.8| 3| 1.4| 0.1| setosa| | 4.3| 3| 1.1| 0.1| setosa| | 5.8| 4| 1.2| 0.2| setosa| | 5.7| 4.4| 1.5| 0.4| setosa| | 5.4| 3.9| 1.3| 0.4| setosa| | 5.1| 3.5| 1.4| 0.3| setosa| | 5.7| 3.8| 1.7| 0.3| setosa| | 5.1| 3.8| 1.5| 0.3| setosa| +------------+-----------+------------+-----------+-------+ only showing top 20 rows ```python data2.dtypes ``` [('Sepal_Length', 'string'), ('Sepal_Width', 'string'), ('Petal_Length', 'string'), ('Petal_Width', 'string'), ('Species', 'string')] ```python data3 = data2.select('Sepal_Length', 'Sepal_Width', 'Species') data3.cache() data3.count() ```
6.
150 ```python data3.show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5|
setosa| | 4.9| 3| setosa| | 4.7| 3.2| setosa| | 4.6| 3.1| setosa| | 5| 3.6| setosa| | 5.4| 3.9| setosa| | 4.6| 3.4| setosa| | 5| 3.4| setosa| | 4.4| 2.9| setosa| | 4.9| 3.1| setosa| | 5.4| 3.7| setosa| | 4.8| 3.4| setosa| | 4.8| 3| setosa| | 4.3| 3| setosa| | 5.8| 4| setosa| | 5.7| 4.4| setosa| | 5.4| 3.9| setosa| | 5.1| 3.5| setosa| | 5.7| 3.8| setosa| | 5.1| 3.8| setosa| +------------+-----------+-------+ only showing top 20 rows ```python data3.limit(5) ``` DataFrame[Sepal_Length: string, Sepal_Width: string, Species: string] ```python
7.
data3.limit(5).show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5|
setosa| | 4.9| 3| setosa| | 4.7| 3.2| setosa| | 4.6| 3.1| setosa| | 5| 3.6| setosa| +------------+-----------+-------+ ```python data3.limit(5).limit(2).show() ``` +------------+-----------+-------+ |Sepal_Length|Sepal_Width|Species| +------------+-----------+-------+ | 5.1| 3.5| setosa| | 4.9| 3| setosa| +------------+-----------+-------+ ```python data4=data2.selectExpr('CAST(Sepal_Length AS INT) AS Sepal_Length') ``` ```python data4 ``` DataFrame[Sepal_Length: int] ```python from pyspark.sql.functions import * ``` ```python data4.select('Sepal_Length').agg(mean('Sepal_Length')).show()
8.
``` +-----------------+ |avg(Sepal_Length)| +-----------------+ |5.386666666666667| +-----------------+ ```python data5=data2.selectExpr('CAST(Sepal_Length AS INT)
AS Sepal_Length','CAST(Petal_Width AS INT) AS Petal_Width','CAST(Sepal_Width AS INT) AS Sepal_Width','CAST(Petal_Length AS INT) AS Petal_Length','Species') ``` ```python data5 ``` DataFrame[Sepal_Length: int, Petal_Width: int, Sepal_Width: int, Petal_Length: int, Species: string] ```python data5.columns ``` ['Sepal_Length', 'Petal_Width', 'Sepal_Width', 'Petal_Length', 'Species'] ```python data5.select('Sepal_Length','Species').groupBy('Species').agg(mean("Sepal _Length")).show() ``` +----------+-----------------+ | Species|avg(Sepal_Length)| +----------+-----------------+ | virginica| 6.08|
9.
|versicolor| 5.48| | setosa|
4.6| +----------+-----------------+ ```python #df = data3.select(col('Sepal_Length'),dat.Sepal_Length.cast('float').alias('pr ice')) ```
Jetzt herunterladen