- 总共有多少学生?map(), distinct(), count()
- 开设了多少门课程?
- 每个学生选修了多少门课?map(), countByKey()
- 每门课程有多少个学生选?map(), countByValue()
- Tom选修了几门课?每门课多少分?filter(), map() RDD
- Tom选修了几门课?每门课多少分?map(),lookup() list
- Tom的成绩按分数大小排序。filter(), map(), sortBy()
- Tom的平均分。map(),lookup(),mean()
hadoop@dblab-VirtualBox:~$ pyspark
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/spark/lib/kafka/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Python version 3.5.1+ (default, Mar 30 2016 22:46:26)
SparkContext available as sc, HiveContext available as sqlContext.
>>> l=sc.textFile('file:///home/hadoop/桌面/chapter4-data01.txt')
>>> l.map(lambda l:l.split(',')).map(lambda l:(l[0])).distinct().count()
265
>>> l.map(lambda l:l.split(',')).map(lambda l:(l[1])).distinct().count()
8
>>> l.map(lambda l:l.split(',')).map(lambda l:(l[1])).countByValue()
defaultdict(<class 'int'>, {'ComputerNetwork': 142, 'Software': 132, 'DataBase': 126, 'Algorithm': 144, 'OperatingSystem': 134, 'Python': 136, 'DataStructure': 131, 'CLanguage': 128})
>>> l.map(lambda l:l.split(',')).map(lambda l:(l[0])).countByValue(0
... )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: countByValue() takes 1 positional argument but 2 were given
>>> l.map(lambda l:l.split(',')).map(lambda l:(l[0])).countByValue()
defaultdict(<class 'int'>, {'Lewis': 4, 'Mike': 3, 'Walter': 4, 'Conrad': 2, 'Borg': 4, 'Bert': 3, 'Eli': 5, 'Clare': 4, 'Charles': 3, 'Alston': 4, 'Scott': 3, 'Angelo': 2, 'Christopher': 4, 'Webb': 7, 'Bill': 2, 'Rock': 6, 'Jonathan': 4, 'Bevis': 4, 'Spencer': 5, 'Arlen': 4, 'Dempsey': 4, 'Colin': 5, 'Bernard': 2, 'Cleveland': 4, 'Aaron': 4, 'Dennis': 4, 'Ward': 4, 'Marico': 6, 'Clark': 6, 'Donald': 4, 'Vic': 3, 'Bartholomew': 5, 'Luthers': 5, 'Elijah': 4, 'Derrick': 6, 'Hobart': 4, 'Joshua': 4, 'Blake': 4, 'Tracy': 3, 'Alvin': 5, 'Sidney': 5, 'Joseph': 3, 'Gordon': 4, 'Elvis': 2, 'Roderick': 4, 'Abel': 4, 'William': 6, 'Devin': 4, 'Antonio': 3, 'Brian': 6, 'Lester': 4, 'Bart': 5, 'Dean': 7, 'Adair': 3, 'Jeremy': 6, 'Boyce': 2, 'Hilary': 4, 'Blithe': 3, '*': 4, 'Booth': 6, 'Les': 6, 'Mick': 4, 'Jason': 4, 'Phil': 3, 'Max': 3, 'Beck': 4, 'Cliff': 5, 'Nick': 5, 'Jeff': 4, 'Kerwin': 3, 'George': 4, 'Sebastian': 6, 'Egbert': 4, 'Archer': 5, 'Blair': 4, 'Boyd': 3, 'Brady': 5, 'Drew': 5, 'Jerome': 3, 'Adam': 3, 'Benson': 4, 'Bradley': 2, 'Jonas': 4, 'Harlan': 6, 'Emmanuel': 3, 'Lou': 2, 'Basil': 4, 'Brandon': 5, 'Ford': 3, 'Will': 3, 'Bowen': 5, 'Andy': 3, 'Greg': 4, 'Alva': 5, 'Willie': 4, 'Lionel': 4, 'Armstrong': 2, 'Bruno': 5, 'Levi': 2, 'Kerr': 4, 'Aldrich': 3, 'Payne': 6, 'Rodney': 3, 'Duke': 4, 'Jeffrey': 4, 'Marsh': 4, 'Christian': 2, 'Adolph': 4, 'Bertram': 3, 'Ernest': 5, 'Claude': 2, 'Merlin': 5, 'Truman': 3, 'Webster': 2, 'Ivan': 4, 'Clement': 5, 'Alvis': 6, 'Abraham': 3, 'Newman': 2, 'Valentine': 8, 'Monroe': 3, 'Nigel': 3, 'Gilbert': 3, 'Amos': 5, 'Harold': 4, 'Giles': 7, 'Glenn': 6, 'Gerald': 4, 'Solomon': 5, 'Armand': 3, 'Matthew': 2, 'Elliot': 3, 'Ben': 4, 'Benjamin': 4, 'Donahue': 5, 'Samuel': 4, 'Sandy': 1, 'Bernie': 3, 'Griffith': 4, 'Abbott': 3, 'Maxwell': 4, 'Kennedy': 4, 'Frank': 3, 'Randolph': 3, 'Boris': 6, 'Simon': 2, 'Colbert': 4, 'Benedict': 6, 'Jerry': 3, 'Edward': 4, 'Harvey': 7, 'Baron': 6, 'Horace': 5, 'Bennett': 6, 'Broderick': 3, 'Robin': 4, 'Elroy': 5, 'Bing': 6, 'Louis': 6, 'Bishop': 2, 'Mark': 7, 'Clyde': 7, 'Pete': 3, 'Martin': 3, 'Corey': 4, 'Bruce': 3, 'Alger': 5, 'Clarence': 7, 'Meredith': 4, 'Rod': 4, 'Todd': 3, 'Merle': 3, 'Archibald': 5, 'Woodrow': 3, 'Kent': 4, 'Chapman': 4, 'Nelson': 5, 'Kevin': 4, 'Ron': 6, 'Ronald': 3, 'Marshall': 4, 'Hiram': 6, 'Wright': 4, 'Virgil': 5, 'Leo': 5, 'Dominic': 4, 'Allen': 4, 'Len': 5, 'Henry': 2, 'Lennon': 4, 'Eugene': 1, 'Andrew': 4, 'Perry': 5, 'Berg': 4, 'Winfred': 3, 'Aries': 2, 'Roy': 6, 'John': 6, 'Sid': 3, 'Alfred': 2, 'Channing': 4, 'Marlon': 4, 'Elmer': 4, 'Stan': 3, 'Marvin': 3, 'Jo': 5, 'Herman': 3, 'Hogan': 4, 'Chad': 6, 'Matt': 4, 'Upton': 5, 'Leopold': 7, 'Sean': 6, 'Tom': 5, 'Eric': 4, 'Francis': 4, 'Vincent': 5, 'Ellis': 4, 'Noah': 4, 'Winston': 4, 'Lucien': 5, 'Kenneth': 3, 'Peter': 4, 'Rory': 4, 'Leonard': 2, 'Dick': 3, 'Enoch': 3, 'Harry': 4, 'Hayden': 3, 'Kim': 4, 'Antony': 5, 'Christ': 2, 'Alexander': 4, 'Ken': 3, 'Albert': 3, 'Elton': 5, 'Montague': 3, 'Nicholas': 5, 'Milo': 2, 'Geoffrey': 4, 'Barton': 1, 'Jay': 6, 'Victor': 2, 'Uriah': 1, 'Miles': 6, 'Saxon': 7, 'Verne': 3, 'Jim': 4, 'Duncann': 5, 'Don': 2, 'Maurice': 2, 'Raymondt': 6, 'Michael': 5, 'Barry': 5, 'Evan': 3, 'Tony': 3, 'Herbert': 3, 'Adonis': 5, 'Bob': 3, 'Alan': 5, 'Colby': 4, 'Chester': 6, 'Wordsworth': 4, 'Jesse': 7, 'Philip': 2})
>>> l.filter(lambda l:'Tom' in l).map(lambda l:l.split(',')).count()
5
>>> l.filter(lambda l:'Tom' in l).map(lambda l:l.split(',')).lookup('Tom')
['DataBase', 'Algorithm', 'OperatingSystem', 'Python', 'Software']
>>>