精品素人自拍偷拍|91精品国产av国产|杨思敏伦理片|91制片厂杨柳信息|亚洲激情综合|蜜桃影像传媒ios下载|亚洲精品视频在线看|打屁股色网站|爱豆文化传媒影片|国产欧美精品一区二区色,明星换脸 av,国产日韩成人av,亚洲成a人影院

您所在的位置:首頁 - 學術(shù)報告

學術(shù)報告

Efficiently Running Al WorkloadsUsing Long SlMD and Matrix lSAs

微信圖片_20241008090634.jpg

主講人:MarcCasas Guix 巴塞羅那超算中心

時間:2024年10月7日9:30-11:30

地 點:主樓B1421

主持人:劉偉峰


主講人簡介:

Marc Casas is a technica researchlead at the Barcelona SupercomputingCenter (BSc)andlecturer attheUniversitat Polit è cnica de Catalunya(UPC). His researchlays betweencomputer architecture(e.g,memoryaddresstranslation,andvector architectures)high-performance computing(e.g.sparse linear algebraparallel deep learning). He is the technicallead of theSONAR (parallelSOftware and New ARchitectures)research group,composed of PhD students, engineers,and postdocs. Marc has lead BSC contributions to severaeuropean projects (Mont-Blanc2020,European RrocessoiInitiative, etc.), and research collaborations with nteandlBM.

Marc has been at Bcsince 2013.He was apostdoctoral research scholar at the Lawrence LivermoreNationalLaboratory(LLNL)from2010 to 2013.He receivedthe Marie Curie and Ramón y Cajal Fellowships on 2014and 2018,respectively.He obtained a 5-years degreein mathematics in 2004,and a PhD degree in ComputerScience in 2010 from the Universitat Politècnica deCatalunya (UPC).


內(nèi)容摘要:

This talk will show how state-of-the-art proposalsto compute convolutions on architectures with CPUsupporting SlMD instructions deliver poor performancefor long SlMD lengths due to freguent cache conflictmisses.The talk will propose new algorithmic approachesto mitigate the limitation of state-of-the-art proposals viathe adaptation of the amount of computation exposed tothe microarchitecture to mitigate cache misses, and theredefinition of the activation memory layout to improvethe memory access pattern.These algorithmic approachesMatrix Tile Extension(MT),a novewill motivate thematrix Instruction-Set Architecture (lSA) that completelydecouples the instruction set architecture from thmicroarchitecture and seamlessly interacts with existincvectorISAs.MTEincurs minimalimplementation overheacsince it only requires a few additional instructions and a64-bit Control Status Register (CSR) to keep its state, andbeats the best state-of-the-art matrix lSA by 1.20x.