JerryLead / SparkInternals

Notes talking about the design and implementation of Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Would it be possible to have this in English?

jcortejoso opened this issue · comments

I have checked your documentation (images, examples and with help of a translator) and it looks pretty awesome, but I wonder if possible we could have it in English, because I thinks it worths.

Thankyou.

If @JerryLead is interested in this, I can also help to translate.

Thanks @darkjh. From English I could translate it to spanish. Furthermore I think if this documentation is available in more languages (English particularly), more people can help keeping it update with Spark versions.

Thanks!

Thanks for your attention @jcortejoso . I'm sorry that I do not have enough free time to translate it into English right now. One reason is that I'm busy preparing a research paper (daily hard life of a PhD student). The other reason is that some details of this document need to be updated to the latest Spark 1.3.

Thanks @darkjh . If you are interested in this translation work, we can do it together. We can talk about how to collaborate through IM (I've sent my QQ to your gmail).

In addition, I've noticed that there are many books talking about Spark such as "Learning Spark: Lightning-Fast Big Data Analysis", "Advanced Analytics with Spark: Patterns for Learning from Data at Scale", and Matei's PhD thesis.

Hi @JerryLead,
I would also like to make this happen. Please let me know the progress if you don't mind.

It is great! Thank you all for your enthusiasm.

@darkjh has translated the 0-Introduction.md and I've merged his pull request to master/markdown/english/0-Introduction.md.

I do not know whether @darkjh is translating 1-Overview.md. Anyway, 2-JobLogicalPlan.md has not been translated. So, @invkrh, you can work on 2-JobLogicalPlan.md if you are interested in this work.

NOTE:
(1) Some details of this document may not be consistent with the latest Spark-1.3. So, when you find somethings inconsistent, feel free to modify and update them in the English version. You can also use brackets such as "(in Spark 1.3, it is ...)" to mark them.

(2) We do not need to translate every sentence, since our target is to convey important design and implementation. Since it is hard to define "important", it is free for you to translate what you like.

(3) You can add your understanding if you think some important points are missing.

I will review the translation carefully and add all the contributors' names to the authors of English version, when this work is done.

@JerryLead

2-JobLogicalPlan.md is in progress (50%)

Some logical plans for shuffle have been changed before.
For example, groupByKey has no MapPartitionsRDD produced after ShuffledRDD. According to the code, combineByKey is used to do the same work as aggregate + MapPartition, which produces only a ShuffledRDD. It is checked by calling toDebugString right after groupByKey.

I just updated the text according to spark 1.3. The concerned pictures could be changed later.

Great @invkrh , I will review the code and change the figure.

@darkjh , translation of 2-JobLogicalPlan will be done by @invkrh . Please comment this issue if you are going to translate other chapters.

@JerryLead I know @invkrh in person so no problem for that.
I'll try to just translate for the moment and not to look at too much what's changed in 1.3. This allows us to have an English version more quickly and afterwards I'll do a pass to see what's no coherent with 1.3.

Hi, @JerryLead

I agree with @darkjh . It would be easier to have a quick English version. For the moment. we could just mark the incoherence, e.g. [changed in 1.3]. All marked parts will be reedited afterwards when the translation is finished.

Thank you @darkjh @invkrh , it is a good idea.

I will take the chapter 4 of shuffle process.

Thanks guys. Awesome job!!

Chapter 5 is also in progress.

@JerryLead I've rebuild my win and linux system this weekend and it took me some time. Hope to get back to translation tomorrow.

@darkjh Thanks for the contribution and it is on your time.

Chapter 6 is now on the way.

So I'll take the last one.

Translation finished for spark 1.0.

You are awesome @darkjh @invkrh
! I will carefully review the translation next month. Do you have SNS id (e.g., weibo id, twitter id)? I want to put the ids along with your name to the authors.

@JerryLead Here's my twitter @juhanlol

In 4-shuffleDetails.md a figure (reduceByKey) is apparently missing!

It is OK now, thanks @19luke89