Return a new RDD containing only the lines with a specific word :
// get lines with a specific wordvallignes=Alice.filter(line => line.contains("walking"))
// save in foldervalsup= lignes.saveAsTextFile("file:<folder path>/walking")
lignes.collect()
Count the number of occurrences of a specific word :
// split text on wordsvalwords=Alice.flatMap(line => line.split(""))
// count the number of occurrences of a specific word "was"
words.filter(l => l.contains("was")).count()
Return the largest line in the text:
// get for each line his lengthvallongueursLignes=Alice.map(l => (l.length,l))
// ordered desc by length and get the first (longest)
longueursLignes.sortByKey(false,1).first()
Show anagram words list :
// transform each line to lowercase and remove character special valtextPure=Alice.map(line => (line.toLowerCase().replace(":","").replace("*","").replace("/","").replace("[","").replace("]","").replace(",","").replace("\"","").replace("-","").replace("`","").replace("'","").replace("_","").replace(";","").replace(".","").replace("!","").replace("?","")))
// split text on wordsvalwords= textPure.flatMap(line => line.split(""))
// delete the words doublevalword= words.distinct()
// generate (key,word)valwordl= word.map(line => (line.toList.sorted,line))
// group words that have the same keyvalanag= wordl.reduceByKey((a,b) =>if(a.contains(b)) a else a +", "+ b)
// view RDD content
anag.collect
// save in foldervalsave= anag.saveAsTextFile("file:<folder path>/anagram")
Example2 :
Build a RDD from text file (twitter compte followed by another compte) :
// split text on usersvalusersDouble=Twitter.flatMap(line => line.split(""))
// delete the user ID doublevalusers= usersDouble.distinct().count()
Get number of follower of each user :
// generate (user ID , 1)valdegres=Twitter.map(x=>(x.split("")(0),1))
// reduce by Keyvaldegre= degres.reduceByKey(_ + _)
// save in foldervalw= degre.saveAsTextFile("file:/home/cloudera/degre")
Get twitter user who has 1000 followers :
// generate (user ID , 1)valdegres=Twitter.map(x=>(x.split("")(0),1))
// reduce by Keyvaldegre= degres.reduceByKey(_ + _)
// filerd users how has 1000 followersvalFiltred= degre.filter{case(x,y)=>y>=1000}
// count users how has 1000 followersFiltred.count()
// save users how has 1000 followersvalsave=Filtred.saveAsTextFile("file:/home/cloudera/1000")