Cluster

演講摘要演講詳細內容相關資料相關網站

演講摘要

Cluster是把許多台電腦放在一起,他們可能分散在不同地方,但是卻讓人感覺到像是一台電腦一樣. cluster 會成形有三個原因: 一個有非常快的 product, 因為機器很快, 所以, product 的產生也非常快. 有非常快的 computation, 因為 computation 非常快, 所以把它聯在一起, 成為一個非常快的平行電腦. 再加上我們有一個標準, 非常穩定的 OS 可以在上面跑. 從現在來看, 這三個因素都相當成熟. Cluster有幾個好處: high performance, scalability, high throughput, system availability, cost effectively. 現在一個crossbox的connection可以達到Gigabit per second,這就造成一個現象就是今天我們去存取一個local disk比去存取一個remote memory還要慢, 這種情形我為什麼不要用我隔壁的那台電腦的memory來當我的backup？這種觀念其實就是cluster的一個基本觀念, 每一個computer都有完整的OS, microprocessor, memory, 這些東西我只需要把它們兜在一起, 上面有很好的middleware在上面, 這就是我們的cluster.

而middleware就是一個軟體, 它給使用者感覺起來像是一個平行處理器或是一個single system, 它提供給我們一些好的performance. single system image最主要就是讓使用者覺得這是一個single system, 這樣的single system的image主要是讓使用者產生錯覺, 覺得我這台電腦是一個single system. 它包括singal entry point, singal file hierarchy, singal control point, singal memory space等等.

而在cluster上的軟體也很重要, 現在有不少已經在市面上發售了. 將來cluster的軟體可以管理這些元件,當其中一台電腦快要死了, 我就先知道通知system manager 去修改他, 如果真的死了, 我如何去將他reconfigure起來, 這都是一些在管理的一面, 根據berkley它們study的結果呢,發現系統死了就不動了, 而這台電腦要死之前會做垂死掙扎, 他會有相當長的時間動作不穩定, 到最後他才死,所以在不穩定的時候對我們來說會有影響.

演講詳細內容

什麼是 cluster 呢？就是很多台電腦放在一起，但也可以分散在不同的地方。重點是說，這些能讓我們感覺到，它們各自是一台獨立、均衡的個體。這整個系統就叫做一個 cluster。所以，我們今天先稍微介紹一下為什麼我們要研究這個題目？這個到底有什麼意義？然後我們再做 computation。現在的 cluster ，我們從幾個層面來看，其中有幾種的分野，分析他們的 complexity。我們要把 cluster 湊在一起需要什麼東西呢？很重要的就是需要一個中間的軟體，叫做 meddleware 。 software、 hardware，而這個是叫 meddleware, 就是在中間把它們連在一起, 這樣的一套軟體系統，我們就把它稱 meddleware。因為時間的關係，今天可能不會講的很詳細。其實講到 meddleware這個英文，英文常常是很有趣的。我們可以把兩個字拼起來成為一個新的字。software hardware , 其實就是把soft 加上 ware, hard 加上 ware。

    一般來說，我們常拿什麼來當作 cluster 跑的作業系統？有很多種選擇，不過，不包括 windows 98、window NT .... 。通常是用 linux、freeBSD, 而且更好的是他們是 open source，你看，多好.... 有了這些東西以後，我們就可以開始來湊 cluster 的系統了。所以，在 cluster中，我很少聽到用 NT 湊 cluster 。有啦... 不過是一些比較異類的人。主流中，還是 linux、freeBSD啦。說到這，其實我也很喜歡問學生一個問題：你覺得當 Windows 2000出來的時候， linux 還會不會有它生存的空間呢？其實，現在都是一個 window 的環境。而現在是一個空窗期，而windows 2000 又聽說要延到明年二月才會出來。而因為 windows 2000 已經號稱要出來很久了，可是都沒有出來。所以，linux 就是佔了這個便宜。而當 windows 2000 出來後，還會有人用 linux 嗎？這就很難說了。如果你們有人對 linux 了解、很有興趣的話，我倒是覺得你們可以幫我做一件事，現在你知道嗎？ sun 有出一種 office 套件，叫做 star office，可是卻不能用中文。那如果說有中文的話，我一定馬上換掉windows 98，來灌 linux 。我知道有很多人都在等，就是說，如果有那麼一天可以在 linux 上用 office。現在，有那麼多人在用 windows 98 的主要原因就是在於它有 office, 像我現在的 present 就是在用 office。那如果說我可以在 linux上 run 類似 office 的東西，那麼很多人便會毫不猶豫換用 linux。所以，如果你們有人有這種抱負，我很鼓勵你們去做 linux 的中文化，把 office 移到 linux 上，做好後，趕快跟我講 ....

     我們說，cluster 會成形有三個原因：一個有非常快的 product ，因為機器很快，所以， product 的產生也非常快。有非常快的 computation，因為 computation 非常快，所以把它聯在一起，成為一個非常快的平行電腦。再加上我們有一個標準、非常穩定的 OS 可以在上面跑。這三個是最主要的因素。有了這三個東西混在一起，我們就可以做 cluster。今天來講，這三個因素都很成熟了，所以我們覺得 cluster 以經到了一個轉型期。以前，cluster 只是一個實驗性質，但現在已經開始起步。另外，我們可以看到一個可能性，就是在 internet 上的 service 這裡。許多在 internet 的 service 這裡可以用到 cluster 的角度。在其上，我們需要大量的 computation ,需要非常快的 IO。而且在 internet 上都是 client - server的環境， server 端需要很快、很大量的 computation。

     這個，兩、三年前我的實驗室一些學生寫了一個game，叫做 "萬王之王"，用的就是 mud 這個東西。而萬王之王，曾經有一陣子在一天內大概有幾千個人同時在線上。那時，他們只有一台 PC ，而我們系主任經常到那PC 去看。因為，凡是進到我們系上的 traffic ，計算中心要向我們收錢。如果，每天若是有幾千人進我們系上玩 "萬王之萬" 遊戲的話，那我們系上會虧本。不過，那時候，我已經昇到教授了，所以，我就不太理他了...   但是 ,那時，有另一個教授也在玩 mud，然後系主任更不高興了..... 那位是副教授...   後來的情形是說，我們去把國科會研究的錢 "抽頭"，拿來墊，也就是其他教授幫我們出錢。那 "萬王之王" 玩到後來，幾千個人在上面，那怎麼辦？一台電腦就已經不夠了，所以，就把幾台電腦連起來。

那今天那個萬王之王在外面賣，它現在還是維持著這樣的基本架構，有著兩三台的主機，你連上去的時候就從gateway上去，連上這個萬王之王就有國與國的分別，我這個國和你那個國不合，我玩過一次覺得太血腥了，看到一個小孩子就要把它殺了，但是不管怎麼樣，它這個cluster就是一個電腦一個國度，這個國度在這台電腦，那個國度在另一台電腦，我從這一個國度走到另一個國度就牽扯到communication ，其實這就是一個很典型的cluster的問題。我一連上去就要輸入user ID，我不管在哪邊都是同樣的ID，然後我就打我的名字，上去一看，就看到什麼DJ啊，還有叫福爾藦斯啦，沒有人用真名，只有我用真的名字，後來我就發現這不是我玩得地方。

Communication這個地方要很快，是個關鍵，以前我們的microprocessor 非常快，大家可能不曉得，現在的switch發展的更快，你看現在這樣的一個switch，就像這一張照片，像這樣的一個switch它提供的東西是非常快的，可能它實際上是一個crossbox 的connection可以達到Gigabit per second的水準；今天我們在cluster上做communication就太慢了，這就造成一個現象就是今天我們去存取一個local disk比去存取一個remote memory還要慢，我的HD比較慢，比去隔壁的電腦讀取memory還要慢，這種情形我為什麼不要用我隔壁的那台電腦的memory來當我的backup？這種觀念其實就是cluster的一個基本觀念，每一個cluster連線都是一台computer，每一個computer都有完整的OS，microprocessor，memory，這些東西我只需要把它們兜在一起，上面有很好的middleware在上面，這就是我們的cluster，今天很多學校都說我們有cluster的環境，這也沒什麼了不起的，我們把很多的pc連在一起，上面跑個pvm的軟體，那些電腦就是個cluster。其實也沒有什麼很大的學問。

好，那這些東西到底有什麼用呢？這些cluster可以做平行處理，這在很多物理和化學方面都是需要的，它們就不需要跑到高速電腦中心去，只要在自己的實驗室擺幾台pc兜在一起就成了cluster。

我稍微講一下這個middleware到底是什麼東西，middleware就是一個軟體，它給使用者感覺起來像是一個平行處理器或是一個single system，它提供給我們一些好的performance，middleware就是一個software提供這樣的效果。怎樣提供一個single system image呢？最主要就是讓使用者覺得這是一個single system，這樣的single system的image主要是讓使用者產生錯覺，覺得我這台電腦是一個single system，它的好處就不要講太多。什麼叫做一個single system image？比方說，我今天一個cluster有十台電腦，我要login到各個系統去的時候，我不需要說我要第一台電腦，我就要login到第一台電腦，我要到第二台電腦就要login到第二台電腦，我只需要說login到整個cluster master，這十台電腦對外來講只有一個IP address，根據這個IP address我就可以使用到這十台電腦的資源。我進到這個cluster裡面去之後，我可以看到十個不同的file system，每一個都有一個single root，至於那個root放在哪邊我們不用管，這也是一個single system image，接下來就是single control point，就是說這十台電腦我可以透過一個地方來監控，這也是一個single system image的範例。

所以我可以做在一台電腦前面控制全部,single virtual networking,感覺上說,我雖然第一時間內我可能是用到兩三條ethernet,但是他們是連在一個single circuit,還有一種single system image是single memorial space,我可以看到10個不同的memorial space,我可以看到一個single大10倍的非常大的memorial space,所以我可以非常自由的寫我的程式時存取任何的location,不會說我要去讀第a個memory的資料時,我不需要講a了,我只要說我要去讀a location,系統自動會轉換.有一個single的job management,就是說我要把我的工作丟到cluster去 ,不需要說這邊要放一個job,那邊也要放一個job,只要告訴系統我放一個job就可以了.這邊舉的這兩個例子都是商業的產品,它可以幫你把一個job丟到cluster 上去.

這個LSF,最近我在高速電腦中心再詢價,聽說他的軟體load零買的話每一個只有2000圓,10個load是20000元,所以其實是非常貴的,所以我們做cluster並不是只是在實驗室裡面,外面已經有公司在賣這種軟體了,而且賣的非常貴一個load是2000台幣,那這個LSF也是起源於學校的實驗室,他最早是多倫多大學很有名的教授,後來就去發展這個微電腦軟體,真的是有其商業價值,那single system interface所以來講,我不需要看到10台啦,我上去一個簡單的一個windom的環境,我可以去掌控這10台電腦的動態,這些都是single system image的例子.問題來了,我怎樣利用software來提供一些服務,讓使用者感覺上有這樣的服務,我們當然不可能說全部都support,但是最理想可以做到全部,但是像這家公司是專門concertrate 在這角度的management,但是又有些公司在做single memorial space, 他也是從學校裡面出來的,他有在外面賣.所以這些很多東西都有商業化,有公司在賣.所以我覺得其實也是最近大家都看到了商業的機會,許多學校的老師都躍躍欲試想要出來開公司,只要手頭上有一些研究生寫的program不管好還是不好,都想要出來開公司所以我們看到的有很多都是學生寫的,台灣的教授都受到這個的影響這個也出來開公司,那個也出來開公司,能夠盡到他的公司就是他的手下得力大將,不用問這個一定是個top的.

這個single system,我可以在os改我的os kernal,提供給你剛才所需的information,我也可以完全不改kernal 只在application 使用者層上動手腳,這是最高杆的,也有完全用hardware的方式,如剛剛講的via他就類似用hardware的東西來implement,那麼還有其他single system image的例子.比方說我可以提供single 的io space,這對我來說,printer port 都是在single io space裡面,換句話說,這10台電腦都有各自的printer,我可以隨時access任何一台,我不需要指明說我要用哪一台,這10台電腦上各有個的名稱,我只要下個獨立的名稱我就可以把job送出去,single io space的意思,我可以把全部的harddisk連在一起,集合起來一個非常大的,那這就更有學問了.我用single process space,每一個電腦上跑的process的id都不一樣,所以今天我可以把我的process當作是在我的local 的cpu上run,這些就是理想的例子,我在cluster這裡提供出來的.我可以做check point這10台電腦我可以把他operation的狀況紀錄下來,當一台電腦死了可以再恢復過來,check point 的方式就很多,我可以把現在的process space丟到harddisk去,我也可以把我的process space save到隔壁的那台電腦去,所以check point 的方式就很多,像我這台電腦的方法就很多如果這台電腦電源快要沒有了,他會自動把它的32mb的memory的東西轉到另一台上,下一次電充好了,他會再從那台把東西送過來,最怕的情形就是說,我再學校正在看一張美女的照片,就沒電了,而當充電好了後,他又自動開啟來,可是這時老婆就在旁邊,所以要確定電腦起來的時候,太太是不是在旁邊.

這些middle ware 我們可以在不同的level來討論,像剛剛講的我可以在programming的level提供你share memory環境,這個東西已經在外面賣了,這個叫frame mark.在這個上面可以提供一些job management 像LSF,然後application在上去,job透過job management把他都到不同的系統上去.那再application我不需要管我的memory要access data,下面這邊我就可以提供single io space,check point,process space,這樣子的話,可以給你感覺像一個single system,下面則有許多系統.所以這就是層層架構狀態.那在幾年前,大家認為說這樣層層疊的架構的overhead比較大,但是現在沒有人敢這樣講,因為我們cpu的速度實在太快了,所以我們可以加一些東西在上面讓他便的比較快,所以我想其實電腦發展的這模快並不是一件好事,如microsoft這種公司,反正我的電腦軟體出來的時候不需要太optimize,一會電腦的cpu就會馬上跟進,所以windows的軟體越來越大,其實裡面很多都是沒有用的.那麼我們提供single space有許多方法,我們可以去修改os裡面,我也可以去os上面去修改,

那麼這個cluster system,比方說是ratio取代,他是從加州理工學院nasa出來的,hpbm是美國一力若大學教授在發展的virtual machine valunteer,它提供非常快速的communication,garden是一個multiple,這是ucc berkly最有名的 network workstation,他在這個subject所發展的技術現在已經喪失了.那麼有一些commercial software 也有在賣了,只是告訴各位這cluster並不是只是在實驗室內.這個是幾個cluster做的比較代表性的.最後稍微講一下就是我這投影片是哪來的呢,其實是從網站上下載來的,這個ieee是個國際性的電子電機學會他們下面有一個組織叫做cluster ,他這裡面有一堆人對cluster computing很有興趣他們組合起來,一起share這方面的information,這個主導人很有趣,你跟他談,他很有興趣其實他是phD學生還沒畢業,一天到晚在外面搞這些東西,不知道他的老闆會不會很吃醋,如果我是他的老闆我大概不會讓他畢業,可是呢我現在就很需要它,我那些papper要找他,這organization中提供mail list 可以互相討論,會有一些workshop,所以明年在日本的workshop,我會和他一起討論一些cluster 在internet的application,會蒐集到一些papper,看看怎樣應用cluster到internet上,也有一些conference,今年的12/1在澳洲的莫爾本會有一個workshop,那其中一些人會到那去討論一些東西你們如果對這方面有興趣的話可以到這個網站去,找一些資料,我也是跟他還不錯,所以就自願service台灣這邊coordination.那在台灣這邊cluster也有蠻多的, 這邊也有兩個網站,一個在中研院的電腦中心做了蠻多的project,專門做中研院裡面像物理,化學啦這些需要high performance computing 的project,那裡面的負責人主要是在linux有研究,常可以見到.所以他們那邊也蠻有趣的.

另外是國家高速電腦中心,做了pc cluster project,我們這個cluster現在就是在國家高速電腦中心有一些pc 想要把pc連在一起,每一台pc 沒有monitor沒有keyboard,但是有一台server,後面的現實際上是一個switch cable,實際上只有一個keyboard跟monitor他透過一個特別的switch,所以把他搬動以後就可以直接控制到某台pc,所以他今天要去看這一台上的狀況,除了用network聯過來看以外,他還可以搬動後面的switch,那麼這個keyboard就可以當作他的keyboard,monitor就可以當作他的monitor.那到目前為止已經建好了這樣的系統,他買了兩個server,然後有16台電腦,每一台電腦有一single的cpu,另外在買了8台電腦裡面是dual cpu,大概是去年底採購,今年一月底建好,那時候買到的是pentium ii 450,然後是dram harddisk每一個電腦裡面放兩塊ethernet card,一台是跟cluster內部做communication,一台是跟外部做communication今年還會有一計劃約花兩百萬的錢,估計可以買到pentium iii 500mhz 的電腦,他們會另外在買其他的,這樣的computation會變成32台的single cpu的電腦,每一台都是pentium iii 500mhz,另外原來的16個single cpu會跟dual cpu合併,組成16個dual cpu 的電腦,pentium ii 400mhz,所以他號稱有64個cpu在這cluster裡面它可以用ethernet 的hub堆疊起來,讓所有的電腦感覺像在一個network,我剛剛講這就是一個政府採購的問題他們今年要買的時候是預估一台要六萬元,可能要買pentium iii 400mhz,但是因行政作業和地震的關係,當錢下來時,便可以買到pentium iii 600mhz 外加一個switch,他裝的os 是 linux redhat 5.2,所有的software都是free,上網就可以抓的下來,所以整個cluster不用花很多成本,當初預計花兩百萬新台幣買這些硬體,其他的都是現成的,那麼出來後就有很多學校有興趣想要轉移,轉移就computer science來說是沒啥學問,就如物理系啦化學系,以為這件是非常難,所以需要我們來幫忙,對我們來講需要解決的問題包括同步的問題,比如說今天user丟了8個job,1個job在8個pc 上跑,估計好跑3天,結果跑到一半,忽然其中一台pc死了,那如何呢,是不是要把其他7台電腦中的全都殺死現在上無一個自動的方法可以殺死,使用者不能殺,於是使用者必須打個電話通知system manager,login到每一台電腦去殺每一個,那我們就很需要一software去偵測哪一個電腦死了,然後提供使用者一個軟體它可以下一個指令去殺其他的,像這就是一個很好的例子在cluster我們需要一個software讓cluster很好用,那其實建cluster並不難,而管理比較難,需要很多消費,其他還有就是耗電量,今天電腦數還不算多,還負擔的起,可是到了明年電腦更多就有問題啦,若要擴充到128台或者256台的話,光是空間就是個大問題,這些都是cluster以後會看到的問題,這裡向各位報告了一下cluster這個滿有趣的題目,我們只是把許多現成的東西都在一起就可以做許多的事,而怎樣把他用的更好用就是一個challenge的問題,那麼未來我們可以看到越來越多我們可以想到好的方法去偷電腦裡的資源,反正電腦擺在那也是沒事,盡量想辦法讓他做些事情,以後也有可能每一個cluster內的一個node有很多個cpu,很多cpu我們所碰的問題就是平行度的問題,就是平行度有很多層,我的平行度可以在4顆或是8顆做平行,也可以整個cluster16個做平行,我跟國外做平行,有3層以上.而切割我們的job就是最大的問題,然後呢大部分的電腦會開始使用gigabit ethernet,連線比較快,岳需software bypass os,我們剛剛講os overhead太大,必須要繞過它讓我們的communication便快,這也是一個很大的問題.像unix os,or linux會越來越普及,而nt 除非microsoft公開source code,那我覺得microsoft一定不會open他的code,就算open也很難trace code,challenge就是你怎麼樣充分的利用你的cluster,所以我們需邀一些軟體價在上面,以及怎麼去管理這些元件,當其中一台電腦快要死了,我就先知道通知system manager 去修改他,如果真的死了,我如何去將他reconfigure起來,這都是一些在管理的一面,根據berkley它們study的結果呢,發現系統死了就不動了,而這台電腦要死之前會做垂死掙扎,他會有相當長的時間動作不穩定,到最後他才死,所以在不穩定的時候對我們來說會有影響,那你如何在他不穩定的時候就事先偵測出來,這就是很大的學問,目前大家並不是很清楚,什麼時候他叫做死掉,什麼時候它需要一些research.當然berkley也有發現許多系統死了,不是因為hardisk 或主機板死了,而是因為ethernet的cable線段掉,或是線頭鬆了,往往出在這種地方,喂什麼要用自動的方式呢?這都是一些challenge,這些問題好像沒什麼學問但是一個大系統要運作就必須考慮到,還有以後的cluster要用1000台pc,你有沒有想像過1000台pc擺在房間內要如何去管理他?這些manage都師challenge,所以把scale便大以後問題就來了,這個公式需要用到1000台電腦比如說我們的公司到一個地步,需要另一家公司做backup,那麼就需要有許多台大電腦聯在一起,他們每天晚上四處去幫人家backup,那這些電腦就需要cluster.

相關資料

1. What is Cluster computing

Cluster computing: the state-of-the-art in theory and practice Rapid improvements in network and processor performance are revolutionizing high-performance computing, transforming clustered commodity workstations into the supercomputing solution of choice. This book brings together contributions from more than 100 leading practitioners, offering a single source for up-to-the-minute information on virtually every key system-related issue in high-performance cluster computing. The book contains expert coverage of "commodity supercomputing" systems and architectures; Internet-based wide area "metacomputing" systems; the role of Java; new applications and algorithms; advanced techniques for enhancing availability and throughput; and much more. Discover the state-of-the-art in:

Communal multiprocessing/adaptive parallelism techniques for resource sharing Networking, lightweight protocols, active messages, "killer switches," and I/O Cluster middleware and resource management systems Cluster computing programming environments, tools, and paradigms Administering high-performance clustered systems

2.what is Cluster Computing

A commonly found computing environment consists of many workstations connected together by a local area network. The workstations, which have become increasingly powerful over the years, can together, be viewed as a significant computing resource. This resource is commonly know as cluster of workstations.

3.Distributed Computing Environment

The OSF Distributed Computing Environment (DCE) is a comprehensive, integrated set of services that supports the development, use and maintenance of distributed applications. It provides a uniform set of services, anywhere in the network, enabling applications to utilise the power of a heterogeneous network of computers.

4.cluster computing environment

Num. Package Vendor Version

(1) Amoeba Vrije Universiteit, 5.2 Amsterdam

(2) Beowulf NASA, USA 1.2.14

(3) BSP Oxford Parallel, UK 1.2

(4) DCE - Distribute Computing OSF, USA 1.1 Environment

(5) DOME - Distributed Object Carnegie Mellon, USA March '95 Migration Envn.

(6) GLU Stanford Research Lab. USA June '95

(7) LAM Ohio Supercomputer Center, 6.0 USA

(8) Networks Of Workstations Berkeley, USA N/A (NOW)

(9) Shrimp Princeton, USA

(10) Thesis MCC, UK

(11) WANE SCRI, FSU, USA ??

5. Why Cluster Computing?

Microprocessor performance now far exceeds that of traditional supercomputers costing many times more. This cost advantage allows a system of scalable computing resources to be built in many different ways: clusters, MPPs, and SMPs to name a few. Because high-speed, low-latency standard communications technologies now rival those available from proprietary sources, the open systems cluster paradigm can flourish. Software for several layers of standard tools for distributed computing now exists. It ranges from communications protocols like UDP/IP and TCP/IP to high-level programming environments such as OSF's DCE and SunSoft's ONC+.

6.what is VPM

VPM (ventral posteromedial nucleus). The thalamic relay nucleus for somatic sensation from the head. Receives inputs from the spinal and main sensory nuclei of the trigeminal nerve (via the medial lemniscus and spinothalamic tract) and projects to somatosensory cortex.

7. What is MPI?

MPI stands for Message Passing Interface. The goal of MPI, simply stated, is to develop a widely used standard for writing message-passing programs. As such the interface attempts to establish a practical, portable, efficient, and flexible standard for message passing.

In designing MPI the MPI Forum sought to make use of the most attractive features of a number of existing message passing systems, rather than selecting one of them and adopting it as the standard. Thus, MPI has been strongly influenced by work at the IBM T. J. Watson Research Center, Intel's NX/2, Express, nCUBE's Vertex, p4, and PARMACS. Other important contributions have come from Zipcode, Chimp, PVM, Chameleon, and PICL.

The main advantages of establishing a message-passing standard are portability and ease-of-use. In a distributed memory communication environment in which the higher level routines and/or abstractions are build upon lower level message passing routines the benefits of standardization are particularly apparent. Furthermore, the definition of a message passing standard provides vendors with a clearly defined base set of routines that they can implement efficiently, or in some cases provide hardware support for, thereby enhancing scalability.

8. Network Information Centre(NIC)

The NIC provides informational, administrative, and procedural support services for the users of the network. It provides the bridge between the users' needs and the technical requirements of the NOC.

The NIC:

Provides registration services such as keeping record of the names of hosts on the network and records of contacts for the hosts, networks, and domains for the NOC, with the help of departmental technical contacts and ITS liaisons.

Supports end users through direct contact by being the principal source of network information. This involves fielding user questions and when necessary referring them to the appropriate information resource or coordinating with the NOC to solve problems.

Collects and maintains information resources that help users gain access to relevant information and applications in order to allow them to make effective use of the network. (See, for example, the ~ftp/nic area on ftp.uwo.ca).

Increases awareness and understanding of network applications and services through local newsletters and other publications.

Server and Workstation Cluster Initiative To Define High-Speed Communication Interfaces

Compaq, Intel and Microsoft Host Industry Leaders to Define Technical Specification

HOUSTON, SANTA CLARA, Calif., and REDMOND, Wash. ?April 16, 1997 ?Compaq Computer Corp., Intel Corp., Microsoft Corp. and other industry leaders today announced an initiative to define high-speed communication interfaces for clusters of servers and workstations. Called the Virtual Interface (VI) Architecture specification, the initiative will enable a new class of scalable cluster products offering high performance, low total cost of ownership and broad applicability. More than 40 companies will participate in the process to complete the draft technical specification before its public release later this year.

A cluster is a group of computers and storage devices that function as a single system. Businesses use clusters in place of individual computers for higher availability and enterprise-class scalability. It is possible to use standard local area network (LAN) and wide area network (WAN) technology to connect the machines in a cluster. However, large clusters and high-performance applications require lower latency, higher bandwidth and additional features not offered by standard LAN and WAN technology. A system area network (SAN) is a specialized network optimized for the reliability and performance requirements of clusters.

The VI Architecture specification provides standard hardware and software interfaces for cluster communications. This will spur innovation in SAN technology and make the LAN, WAN and SAN differences transparent to the applications. The VI Architecture specification will support reliable, high-performance SANs, helping clusters achieve their full potential as cost-efficient platforms for large-scale, mission-critical applications.

"Information technology industry leaders continue to lower the cost of information processing on all fronts while enabling advanced customer solutions by bringing value-added technology to the mass market," said Britt Mayo, director of information technology at Pennzoil Company. "Their efforts to drive the creation of an industry standard for the VI Architecture will make multisystem solutions widely available at new levels of price/performance."

The VI Architecture specification will be media, processor and operating system independent. The software interface will support a variety of efficient programming models to simplify development and ensure performance. The hardware interface will be compatible with standard networks such as ATM, Ethernet and Fiber Channel as well as specialized SAN products available from a variety of vendors.

Companies that want to participate in the VI Architecture specification process should send an e-mail request for details to wire@co.intel.com. Companies today announcing their participation in the VI Architecture specification process include Adaptec, Inc., Compaq Computer Corp., Data General Corp., Digital Equipment Corp., Dolphin Interconnect Solutions Inc., Finistar Corp., GigaNet, Groupe Bull, Hewlett-Packard Co., Hitachi Ltd., Informix Software Inc., Intel Corp., International Business Machines Corp., LSI Logic, Marathon Technologies Corp., Microsoft Corp., MPI Software Technology Inc., Myricom Inc., Myrias Computer Technologies Inc., NEC Corp., Network Integrity, Novell Inc., Olivetti Personal Computers, QLogic Corp., SAP, Samsung Electronics Company Ltd.,The Santa Cruz Operation Inc., Sequent Computer Systems Inc., Siemens Nixdorf Informationssysteme AG, Siemens Pyramid Information Systems Inc., Silicon Graphics Inc., Stratus Computer Inc., Tandem Computers Inc., Toshiba Corp., University California Space Sciences Laboratory, Unisys Corp., Veritas Software and VLSI Technology Inc.

Company Backgrounds

Compaq Computer Corp., a Fortune 100 company, is the fifth largest computer company in the world and the largest global supplier of personal computers, delivering useful innovation through products that connect people with people, and people with information. Customer support and information about Compaq and its products can be found at http://www.compaq.com/ or by calling (800) OK-COMPAQ. Product information and reseller locations can be obtained by calling (800) 345-1518.

Founded in 1975, Microsoft (NASDAQ "MSFT") is the worldwide leader in software for personal computers. The company offers a wide range of products and services for business and personal use, each designed with the mission of making it easier and more enjoyable for people to take advantage of the full power of personal computing every day.

Intel, the world's largest chip maker, is also a leading manufacturer of personal computer, networking and communications products. Additional information is available at www.intel.com/pressroom.

演講投影片: cluster-intro.ppt

相關網站

http://www.cs.wisc.edu/~arch/www/

http://www.cs.dartmouth.edu/pario/

http://yara.ecn.purdue.edu/~pplinux/Sites/

http://www.cs.umd.edu/~keleher/dsm.html

http://www.sunlabs.com/research/solaris-mc

http://www.microprocessor.sscc.ru

http://www.beowulf.org

http://www.sis.port.ac.uk/~mab/Metacomputing/

http://www.mosix.cs.huji.ac.il/

http://cnls.lanl.gov/avalon/

http://www.parnass.net/

http://www.nchc.gov.tw/RESEARCH/pccluster/index.html（國家高速電腦中心）

http://www.ict.ac.cn/chpc/tang/index.html

http://www.cs.nthu.edu.tw/~king/(金仲達教授)