BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Sabre//Sabre VObject 4.5.7//EN
CALSCALE:GREGORIAN
BEGIN:VTIMEZONE
TZID:Europe/Zurich
X-LIC-LOCATION:Europe/Zurich
TZURL:http://tzurl.org/zoneinfo/Europe/Zurich
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19810329T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19961027T030000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:news238@dmi.unibas.ch
DTSTAMP;TZID=Europe/Zurich:20180716T205111
DTSTART;TZID=Europe/Zurich:20161007T110000
SUMMARY:Seminar in Numerical Analysis: Fabio Baruffa (Leibniz-Rechenzentrum
 \, München)
DESCRIPTION:In the framework of the Intel Parallel Computing Centre at Leib
 niz  Supercomputing Centre (LRZ)\, Fabio Baruffa will present recent resul
 ts  on the performance optimization of Gadget-3 on multi and many-core  co
 mputer architectures\, including the new Intel Xeon Phi processor of  seco
 nd generation\, codenamed Knights Landing (KNL). An overview of  results f
 or node-level scalability\, vector efficiency and performance  are present
 ed here. Our work is based on an isolated\, representative  code kernel\, 
 where threading parallelism\, data locality and  vectorization efficiency 
 was improved. The node-level parallel  efficiency improved by factors rang
 ing from 5x to 16x on Haswell and KNL  nodes\, respectively. Moreover\, a 
 vectorization efficiency of 80% (6.6x)  on a prototypical target loop of t
 he code is obtained without  programming using intrinsics instructions.
X-ALT-DESC:In the framework of the Intel Parallel Computing Centre at Leibn
 iz  Supercomputing Centre (LRZ)\, Fabio Baruffa will present recent result
 s  on the performance optimization of Gadget-3 on multi and many-core  com
 puter architectures\, including the new Intel Xeon Phi processor of  secon
 d generation\, codenamed Knights Landing (KNL). An overview of  results fo
 r node-level scalability\, vector efficiency and performance  are presente
 d here. Our work is based on an isolated\, representative  code kernel\, w
 here threading parallelism\, data locality and  vectorization efficiency w
 as improved. The node-level parallel  efficiency improved by factors rangi
 ng from 5x to 16x on Haswell and KNL  nodes\, respectively. Moreover\, a v
 ectorization efficiency of 80% (6.6x)  on a prototypical target loop of th
 e code is obtained without  programming using intrinsics instructions. 
DTEND;TZID=Europe/Zurich:20161007T120000
END:VEVENT
END:VCALENDAR
