Categories
程式開發

Apache Beam 大數據處理一站式分析



{“type”:”doc”,”content”:[{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null}},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/08/0888bcc84c3d50c1973919411ee10aea.png”,”alt”:null,”title”:null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”一. 介绍”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”大數據處理其實經常被很多人低估,缺乏正確的處理體系,其實,如果沒有高質量的數據處理流程,人工智能將只有人工而沒有智能。現在的趨勢是數據體量不斷上漲,團隊卻低估了規模所帶來的複雜度。大數據領域泰斗級人物Jesse Anderson曾做過研究,一個組織架構比較合理的人工智能團隊,數據處理工程師需要佔團隊總人數的4/5,然而很多團隊還沒有認識到這點。大數據處理涉及大量複雜因素,而Apache Beam恰恰可以降低數據處理的難度,它是一個概念產品,所有使用者都可以根據它的概念繼續拓展。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Apache Beam提供了一套統一的API來處理兩種數據處理模式(批和流),讓我們只需要將注意力專注於數據處理的算法上,而不用再花時間去維護兩種數據處理模式上的差異。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”公司用Beam的業務場景,做數據引擎服務,其他中台產品,以此為基礎做一些其他服務,比如數據交換,計算開發平台,數據分析等等,中台的概念不是本章的重點,不在此展開,大部分所謂的各種各樣的中台,其實就是個業務平台而已。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”二. 编程模型”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”現實應用場景中,各種各樣的應用需求很複雜,例如:”},{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#4a4a4a”,”name”:”user”}}],”text”:”我們假設Hive 中有兩張數據源表,兩個表數據格式一樣,我們要做的是:按照日期增量,新版本根據字段修改老版本的數據,再增量一部分新的數據,最後生成一張結果表。案例實現鏈接:”},{“type”:”link”,”attrs”:{“href”:”https://www.infoq.cn/article/Wa7lr45MxmNjKFTMJQRJ” ,”title”:null},”content”:[{“type”:”text”,”text”:”https://www.infoq.cn/article/Wa7lr45MxmNjKFTMJQRJ”}]}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/6a/6ae912c6704b63dede5e00cb115cac0d.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:”center “,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”架構流程”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”這案例下包含多種不同處理模塊,最後連接在一起,得出一個有向無環圖,稱為一個工作流系統(”},{“type”:”text”,” marks”:[{“type”:”color”,”attrs”:{“color”:”#333333″,”name”:”user”}}],”text”:”Workflow System”},{“type”:”text”,”text”:”),在這種系統下,不可能就簡單用數據轉換操作,其中涉及到四種常見的涉及模式。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#333333″,”name”:”user”}}],”text”:”2.1 Workflow”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”複製模式:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#444444″,”name”:”user”}}],”text”:”複製模式通常是將單個數據處理模塊中的數據,完整地複製到兩個或更多的數據處理模塊中,然後再由不同的數據處理模塊進行處理。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/b4/b4705aa83627b3e28ceff2b00eaddac0.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:”center “,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”複製模式”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”例如:結果集合被不同處理流程調用,輸出到不同的數據庫。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null}},{“type”:”paragraph” ,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”過濾模式:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#444444″,”name”:”user”}}],”text”:”過濾掉不符合特定條件的數據。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/5b/5b5693739c15e167126740266892af79.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:”center “,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”過濾模式”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”例如:通過一系列規則篩選結果集。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”分離模式:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#444444″,”name”:”user”}}],”text”:”如果你在處理數據集時並不想丟棄裡面的任何數據,而是想把數據分類為不同的類別進行處理時,你就需要用到分離式來處理數據。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/d4/d432dd8a2e22d9cca8a73ffe24d304bf.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:”center “,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”分離模式”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”例如:針對全部用戶,用戶分級,觀察不同組用戶的行為,用戶增長分析。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”合併模式:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#444444″,”name”:”user”}}],”text”:”合併模式會將多個不同的數據轉換集中在一起,成為一個總數據集,然後將這個總數據集放在一個工作流中進行處理。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/42/42d1ec069e46d0276c47fc5124dfc4ce.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:”center “,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”合併模式”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”例如:數據融合之後,輸出一份結果集。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text”,”text”:”2.2 Lambda架构”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Lambda 架構(Lambda Architecture)是由Twitter 工程師南森·馬茨(Nathan Marz)提出的大數據處理架構。這一架構的提出基於馬茨在BackType 和Twitter 上的分佈式數據處理系統的經驗。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Lambda 架構使開發人員能夠構建大規模分佈式數據處理系統。它具有很好的靈活性和可擴展性,也對硬件故障和人為失誤有很好的容錯性。Lambda 架構總共由三層系統組成:批處理層(Batch Layer),速度處理層(Speed Layer),以及用於響應查詢的服務層(Serving Layer)。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/ce/ce4bf37a1cca7754d2ceb213a02efd47.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null, “origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”這種架構,穩定高,離線計算和實時計算會冗餘代碼,如果用比較複雜引擎交替執行任務,維護性很高,用實時計算彌補離線計算的不足。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text”,”text”:”2.3 Kappa架构”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Kappa 架構是由LinkedIn 的前首席工程師傑伊·克雷普斯(Jay Kreps)提出的一種架構思想。克雷普斯是幾個著名開源項目(包括Apache Kafka 和Apache Samza這樣的流處理系統)的作者之一,也是現在Confluent 大數據公司的CEO。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#333333″,”name”:”user”}}],”text”:”克雷普斯提出了一個改進 Lambda 架構的觀點:”}]},{“type”:”blockquote”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#333333″,”name”:”user”}}],”text”:”我們能不能改進Lambda 架構中速度層的系統性能,使得它也可以處理好數據的完整性和準確性問題呢?我們能不能改進Lambda 架構中的速度層,使它既能夠進行實時數據處理,同時也有能力在業務邏輯更新的情況下重新處理以前處理過的歷史數據呢?”}]}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/1d/1d2ec4660da22eab60a4b208235b39c8.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null, “origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”這種架構其實用kafka性能的特點,海量存儲來延展出來的架構,既可以存儲歷史數據,也可以處理實時數據,但是穩定不高,需要維護好kafka,LinkedIn 開源出來計算引擎,也跟這種架構配套使用的。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text”,”text”:”2.4 小结”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”做技術關注那些不變,才能隔離好那些變化,這些思想下,才能延展出一系列服務,讓整個體系蘊含生命,技術在於悟,以靠近前輩的方式,離開前輩。” }]},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”三. PCollection”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text”,”text”:”3.1 Apache Beam 发展史”}]},{“type”:”bulletedlist”,”content”:[{“type”:”listitem””content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”listitem””content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”listitem””content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”listitem””content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”在2003年以前,Google內部其實還沒有一個成熟的處理框架來處理大規模數據。”}]}]},{“type”:”listitem”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”在2004年時候”},{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#222222″,”name”:”user”}}],”text”:”,Google 發布的一篇名為“MapReduce: Simplified Data Processing on Large Clusters”的論文,將MapReduce架構思想總結出來。它希望能提供一套簡潔的API來表達工程師數據處理的邏輯。另一方面,要在這一套API底層嵌套一套擴展性很強的容錯系統,使得工程師能夠將心思放在邏輯處理上,而不用過於分心去設計分佈式容錯系統。”}]}]},{“type”:”listitem”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#222222″,”name”:”user”}}],”text”:”在2010年時候,Google公開了FlumeJava架構思想論文。它將所有數據都抽象成名為PCollection的數據結構,無論從內存中讀取數據,還是在分佈式環境下讀取文件。這樣的好處其實為了讓測試代碼即可以在分佈式環境下運行,也可以在單機內存下運行。”}]}]},{“type”:”listitem”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#222222″,”name”:”user”}}],”text”:”在2013年時候,Google公開Millwheel思想,它的結果整合幾個大規模數據處理框架的優點,推出一個統一框架。”}]}]},{“type”:”listitem”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#222222″,”name”:”user”}}],”text”:”在2015年的時候,Google公佈了Dataflow Model論文,同時也推出了基於Dataflow Model 思想的平台Cloud Dataflow,讓Google 以外的工程師們也能夠利用這些SDK 來編寫大規模數據處理的邏輯。”}]}]},{“type”:”listitem”,”content”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”paragraph””attrs”:{“indent”:0″number”:0″align”:null”origin”:null}”content”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#222222″,”name”:”user”}}],”text”:”在2016年的時候,Google基於要在多平台運行程序的契機,聯合Talend、Data Artisans、Cloudera 這些大數據公司,基於Dataflow Model 的思想開發出了一套SDK,並貢獻給了Apache Software Foundation。而它Apache Beam 的名字是怎麼來的呢?就如文章開篇圖片所示,Beam 的含義就是統一了批處理和流處理的一個框架。現階段Beam支持Java、Python和Golang等等。”}]}]}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/f2/f20a024f3680decadda87da56cbf036a.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null, “origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”通過Apache Beam,最終我們可以用自己喜歡的編程語言,通過一套Beam Model統一地數據處理API,編寫數據處理邏輯,放在不同的Runner上運行,可以實現到處運行。” }]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”幾篇論文地址:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#505050″,”name”:”user”}}],”text”:”MapReduce: “},{“type”:”link”,”attrs”:{“href”:”https://research.google.com/archive/mapreduce-osdi04.pdf”,” title”:null},”content”:[{“type”:”text”,”text”:”https://research.google.com/archive/map reduce-osdi04.pdf”}]}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#505050″,”name”:”user”}}],”text”:”Flumejava:”},{“type”:”link”,”attrs”:{“href”:”https://research.google.com/pubs/archive/35650.pdf”,” title”:null},”content”:[{“type”:”text”,”text”:”https://research.google.com/pubs/archive/35650.pdf”}]}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#505050″,”name”:”user”}}],”text”:”MillWheel:”},{“type”:”link”,”attrs”:{“href”:”https://research.google.com/pubs/archive/41378.pdf”,” title”:null},”content”:[{“type”:”text”,”text”:”https://research.google.com/pubs/archive/41378.pdf”}]}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#505050″,”name”:”user”}}],”text”:”Data flow Model:”},{“type”:”link”,”attrs”:{“href”:”https://www.vldb.org/pvldb/vol8/p1792-Akidau. pd”,”title”:null},”content”:[{“type”:”text”,”text”:”https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf”}]}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:3},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#505050″,”name”:”user”}}],”text”:”3.2 “},{“type”:”text”,”text”:”PCollection特點”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection,就是Parallel Collection,意思是可並行計算的數據集。如果了解Spark的話,就會發現PCollection和RDD相似。在Beam的數據結構體系中,幾乎所有數據都能表達成PCollection ,例如復雜操作數據導流,就是用它來傳遞的。”}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text”,”text”:”PCollection lines = pipeline.apply(TextIO.read().from(“url”)n .withHintMatchesManyFiles());”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection需要Coders:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”因為整個Beam計算流程最終會運行在一個分佈式系統。所有的數據都有可能在網絡上的節點之間傳遞。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Coder有兩種方式,一.需要註冊全局CoderRegistry中,二.每次轉換操作後,手動指定Coder。”}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text”,”text”:”PipelineOptions options = PipelineOptionsFactory.create();nPipeline p = Pipeline.create(options);n//全局nCoderRegistry cr = p.getCoderRegistry();ncr.registerCoder(Integer.class, BigEndianIntegerCoder.class);n//手动指定,可以使用Beam自带的序列化类型,也可以自定义。np.apply(Create.of(list))n .setCoder(KvCoder.of(StringUtf8Coder.of(),StringUtf8Coder.of()))”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection是無序:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection的無序特性其實也和分佈式本質有關,無序分佈PCollection,異步的,保證性能。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection沒有固定大小:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”批處理和流數據的區別,在於一個是有界數據和無界數據,因為如此PCollection沒有限制它的容量。在實現上,Beam是有window來分割持續更新的無界數據,一個流數據可以被持續的拆分成不同的小塊。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection不可變性:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”PCollection不提供任何修改它所承載的數據方式,如果修改PCollection,只能Transform(轉換)操作,生成新的PCollection的。Beam 的PCollection 都是延遲執行,為了性能,最後生成執行計劃,到處運行。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”擴展:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”其實如果對函數式編程有了解的朋友,PCollection有些特點跟函數式編程特點有相通的地方,因為,PCollection底層就是用這種範式抽像出來的,為了提高性能,不會有大量的變化機制,在整個編譯運行中洩漏資源。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”四. Pipeline”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Beam中,所有數據處理邏輯都被抽象成數據流水線(Pipeline)來運行,簡單來說,就是從讀取數據集,將數據集轉換成想要的結果數據集這樣一套流程。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”例1″}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text””text”:”PipelineOptionsoptions=PipelineOptionsFactorycreate();n//設置執行引擎,DirectRunner為本地引擎,資源受限,最大並發數限制。noptionssetRunner(DirectRunnerclass);nnnPipelinepipeline=Pipelinecreate(options);nnnList[{“type”:”text””text”:”PipelineOptionsoptions=PipelineOptionsFactorycreate();n//设置执行引擎,DirectRunner为本地引擎,资源受限,最大并发数限制。noptionssetRunner(DirectRunnerclass);nnnPipelinepipeline=Pipelinecreate(options);nnnList[{“type”:”text””text”:”PipelineOptionsoptions=PipelineOptionsFactorycreate();n//設置執行引擎,DirectRunner為本地引擎,資源受限,最大並發數限制。noptionssetRunner(DirectRunnerclass);nnnPipelinepipeline=Pipelinecreate(options);nnnList[{“type”:”text””text”:”PipelineOptionsoptions=PipelineOptionsFactorycreate();n//设置执行引擎,DirectRunner为本地引擎,资源受限,最大并发数限制。noptionssetRunner(DirectRunnerclass);nnnPipelinepipeline=Pipelinecreate(options);nnnList kvs = new ArrayList();nfor (int i = 0; i < 10; i++) {n String id="id:"+i;n String name="name:"+i;n kvs.add(KV.of(id,name));n}n//1.设置数据集,2.Filter.by过滤操作,3.通过JdbcIO输出到数据库中。npipeline.apply(Create.of(kvs)).setCoder(KvCoder.of(StringUtf8Coder.of(),StringUtf8Coder.of()))n .apply(Filter.by( (KV kv) -> kv.getKey() == “id:1”))n .apply(JdbcIO.write().withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(“數據連接池”)).withStatement(“sql”)n .withPreparedStatementSetter((element, statement) -> {n statement.setString(1 ,element.getKey());n statement.setString(2,element.getValue());n }));nnn//運行npipeline.run().waitUntilFinish(“}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”五. Transform”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Beam 中數據處理的最基本單元是 Transform。Beam 提供了最常見的 Transform 接口,比如 ParDo、GroupByKey,其中 ParDo 更為常用。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”ParDo 就是 Parallel Do 的意思,表達的是很通用的數據處理操作;GroupByKey 的意思是把一個 Key/Value 的數據集按 Key 歸併。”}]},{“type”:”blockquote”,”content”:[{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#000000″,”name”:”user”}},{“type”:”strong”}],”text”:”注意:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”可以用ParDo 來實現GroupByKey,一種簡單的方法就是放一個全局的哈希表,然後ParDo 裡把一個一個元素插進這個哈希表裡。但這樣的實現方法其實無法使用,因為你的數據量可能完全無法放進一個內存哈希表。”}]}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”text””marks”:[{“type”:”color”,”attrs”:{“color”:”#4a4a4a”,”name”:”user”}}],”text”:”使用 ParDo 時,需要繼承它提供 DoFn 類,可以把 DoFn 看作 ParDo 的一部分, Transform 是一個概念方法,裡麵包含一些轉換操作。”}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text”,”text”:”//DoFn 的模板nstatic class DoFnTest extends DoFn{n @Setupn public void setUp(){…}nnn @StartBundlen public void startBundle(){…}nnn @ProcessElementn public void processElement( ProcessContext c) {…}n n @FinishBundlen public void finishBundle(){…}nnn @Teardowntn public void teardown(){…}t”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”處理某個Transform 的時候,數據是序列化的(PCollection),Pipeline 註冊的流水線會將這個Transform 的輸入數據集PCollection 裡面元素分割成不同的Bundle,再將這些Bundle 分發給不同的Worker 來處理。Beam 數據流水線具體會分配多少個Worker,以及將一個PCollection 分割成多少個Bundle 都是隨機的,具體跟執行引擎有關,設計到不同引擎的動態資源分配,可以自行查閱資料。” }]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”Transform 調用DoFn 時,@Setup 初始化資源,@Teardown 處理實例調用完以後,清除資源,防止洩露。@StartBundle 方法跟Bundle 有關,在Bundle 中每個輸入元素上調用@ProcessElement(類似map 輸入每行數據),如果提供DoFn 的@FinishBundle 調用它,(Bundle 中數據流完)調用完成@FinishBundle 之後,下次調用@StartBundle 之前,框架不會再次調用@ProcessElement 或@FinishBundle。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”如果處理 Bundle 的中間出現錯誤,一個 Bundle 裡面的元素因為任意原因導致處理失敗了,則這整個 Bundle 裡面都必​​須重新處理。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”在多步驟 Transform 中,如果一個 Bundle 元素發生錯誤了,則這個元素所在的整個 Bundle 以及與這個 Bundle 有關聯的所有 Bundle 都必須重新處理。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”這個就是Beam數據流水線處理模型。”}]},{“type”:”heading”,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”六. Pipeline I/O”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”讀取數據集用Pipeline I/O來實現。”}]},{“type”:”image”,”attrs”:{“src”:”https://static001.geekbang.org/infoq/fe/fe259847ea0507189a0d333d7ae39d6c.png”,”alt”:null,”title”: null,”style”:null,”href”:null,”fromPaste”:true,”pastePass”:true}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number “:0,”align”:null,”origin”:null}},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null, “origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”讀取數據集:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”一個輸入數據集的讀取通常是通過Read Transform 來完成的。Read Transform 從外部源(External Source) 中讀取數據,這個外部源可以是本地機器上的文件,可以是數據庫中的數據,也可以是雲存儲上面的文件對象,甚至可以是數據流上的消息數據。Read Transform 的返回值是一個PCollection,這個PCollection 就可以作為輸入數據集,應用在各種Transform 上。Beam數據流水線對於用戶什麼時候去調用Read Transform 是沒有限制的,我們可以在數據流水線的最開始調用它,當然也可以在經過了N 個步驟的Transforms 後再調用它來讀取另外的輸入數據集。 “}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text”,”text”:”//文件nPCollection inputs = p.apply(TextIO.read().from(filepath));n//在Beam的io包下有很多关于读取数据的流,大约有34种,也可以自定义io。n//jdbc io流nPCollection t1 = pipeline.apply(JdbcIO.read().withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(cpds)).withCoder(SchemaCoder.of(type)).withQuery(“select * from template1″).withRowMapper(new JdbcIO.RowMapper() {…}));”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:” “}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”輸出數據集:”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”將結果數據集輸出到目的地址的操作是通過Write Transform 來完成的。Write Transform 會將結果數據集輸出到外部源中。與Read Transform 相對應,只要Read Transform 能夠支持的外部源,Write Transform 都是支持的。在Beam 數據流水線中,Write Transform 可以在任意的一個步驟上將結果數據集輸出。所以,用戶能夠將多步驟的Transforms 中產生的任何中間結果輸出。”}]},{“type”:”codeblock”,”attrs”:{“lang”:”java”},”content”:[{“type”:”text”,”text”:”p.apply(TextIO.write().to(“url”).withSuffix(“文件后缀”));”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null}},{“type”:”heading” ,”attrs”:{“align”:null,”level”:2},”content”:[{“type”:”text”,”text”:”七.作者介绍”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”李孟,目前就職於知因智慧數據科技有限公司,負責數據中台數據引擎基礎架構設計和中間件開發,專注云計算大數據方向。”}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”博客:”},{“type”:”link”,”attrs”:{“href”:”https://blog.csdn.net/qq_19968255″,”title”:null}, “content”:[{“type”:”text”,”text”:”https://blog.csdn.net/qq_19968255″}],”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}]}]},{“type”:”paragraph”,”attrs”:{“indent”:0,”number”:0,”align”:null,”origin”:null},”content”:[{“type”:”text”,”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}],”text”:”關於Beam更多內容:”},{“type”:”link”,”attrs”:{“href”:”https://gitbook.cn/gitchat/activity/5dad728e7c3fea79dbc619a4″,” title”:null},”content”:[{“type”:”text”,”text”:”https://gitbook.cn/gitchat/activity/5dad728e7c3fea79dbc619a4″}],”marks”:[{“type”:”color”,”attrs”:{“color”:”#494949″,”name”:”user”}}]}]}]}