Twitter
Home Staff Members

Top Ten

Duato, Jose

Personal Information:

Position: Main Researcher (Full Professor) Duato, Jose
Email: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Phone or fax: +34963877007x79705
Location: Valencia
Description:

Short Curriculum Vitae

The most significant achievements of José Duato, including transfer of his research results to industry, publication records, service to profession and international collaborations, are summarized below:

  • On his own, he developed several adaptive routing strategies to improve interconnection network performance. These strategies are so efficient and cost-effective that some of them have been implemented in the most powerful supercomputers, including the Cray T3E and Cray Black Widow supercomputers, the Compaq Alpha 21364 microprocessor used in the Alphaserver GS320 supercomputer, and the IBM BlueGene/L supercomputer. These supercomputers were among the most powerful ones when they were launched. In particular, the IBM Blue- Gene/L is the most powerful supercomputer today.
  • He developed, in collaboration with Xyratex (a UK company), a technique called Regional Explicit Congestion Notification (RECN), which is the only truly scalable congestion management technique for lossless networks to date. This result has been protected with two joint patents and RECN is currently being incorporated into the most important standard for future communication systems: Advanced Switching Interconnect.
  • He has collaborated with the IBM Zurich Research Laboratory (the only IBM research laboratory in Europe) since year 2001 until their Communications Department was closed. The main outcome of this collaboration are four joint patents with IBM (Xmorph, RXS, BFC, and AFC). Also, he and José Flich developed the In-Transit Buffer (ITB) routing technique. Myricom (a USA company) has included support for ITBs in its popular Myrinet network by means of a special ITB packet type.
  • He is the first author of a 500-page book published in the USA, which has become the most popular book on interconnection networks. Also, he has authored or co-authored more than 340 publications, including book chapters and papers in journals and conference proceedings. He advised or co-advised 22 PhD students.
  • He served as Associate Editor of IEEE Transactions on Computers, the oldest and most prestigious journal in the area of computers in history (55 year old). He also served as Associate Editor of IEEE Transactions on Parallel and Distributed Systems, the second most prestigious journal in the area of parallel computers, being the first European researcher who served in this capacity. Also, he is the only Spanish researcher that served as associate editor of both of these journals.
  • He was the General Co-Chair for the 2001 International Conference on Parallel Processing and played a vital role in bringing this prestigious conference to Valencia, being the first time that it was held in Europe. This is the oldest conference in the area of parallel computers (at that time it was in its 30th year). He served as the Program Chair (Chair of the Scientific Program Committee) of the 2004 International Symposium on High Performance Computer Architecture. This is one of the two most relevant international conferences in the broad area of computer architecture. Also, he served as Program Co-Chair of the 2005 International Conference on Parallel Processing, and as Steering Committee Member, Program Co-Chair, Program Vice-Chair or Program Committee Member in more than 55 international conferences and workshops, including the most prestigious ones in the area of parallel computers: ISCA, HPCA, ICS, ICPP, IPPS/SPDP, IPDPS, HiPC, and Euro-Par.
  • He collaborated with several researchers from foreign countries, including some of the most prestigious professors in the area of interconnection networks (Lionel M. Ni, Sudhakar Yalamanchili, Chita Das, Timothy M. Pinkston, Dhabaleswar K. Panda, and Anand Sivasubramaniam, all of them currently serving or having served as associate editors of one or both of the two most prestigious journals in the area of computers). In particular, he co-authored 51 papers with researchers from six USA Universities, a USA national laboratory, and two USA companies, and 22 additional papers with researchers from five European and Asian Universities, and a European company. He also filed a joint patent with a Norwegian and a U.S. researchers.
  • He was invited to present keynote speeches in several international conferences (eight) as well as invited talks in several universities and national laboratories in the USA (nine), Europe (two) and Asia (one), including University of Illinois at Urbana-Champaign, University of Southern California, Georgia Institute of Technology, Ohio State University, Michigan State University, Pennsylvania State University, and Los Alamos National Laboratory. He was also invited to present talks at the research laboratories of some leading computer companies (IBM, Compaq, Sun Microsystems, Intel). Also, he was invited to participate in panel sessions in several international conferences and workshops (eight).
  • He was invited to send recommendation letters to support the promotion of several Professors in the USA to Associate Professor and Full Professor, as well as award nominations. He also served on the PhD dissertation committee of several doctoral candidates at various universities in the USA, Canada and Europe.

 

An important aspect of the research developed by José Duato is that he followed a disruptive approach in several cases. Under this approach, existing solutions are discarded and a completely new and superior solution is proposed, analyzed, and evaluated, thus making previously existing solutions to become obsolete. Some examples of this approach follow:

  • Adaptive routing techniques that allow cyclic dependencies between network resources. This is a counterintuitive approach because it appears at first glance that deadlocks may form when allowing cyclic dependencies between network links or buffers. Only a complex mathematical proof can show that deadlock freedom can be guaranteed if certain conditions are met. This research was so disruptive when it was developed that it was rejected by several peers and considered to be incorrect, even by the most prominent researchers at that time. However, it was finally accepted and several well-known researchers developed their own version of this theory. The benefits from this disruptive approach are a dramatic reduction in the number of resources required to implement fully adaptive routing. This drastic reduction in the number of resources has led all the supercomputer designers who wanted to implement adaptive routing in their interconnect to select Duato?s technique as the most suitable and efficient one.
  • Dynamic network reconfiguration techniques for lossless networks. It was thought that dynamic network reconfiguration is not possible in a lossless network like the ones used in most high-performance clusters. The reason is that routing tables for the different routers or switches cannot be synchronously updated, and therefore, routing tables for the old and new network configurations may coexist, usually leading to deadlocks. As a consequence, all the commercial products (e.g. Myrinet) are based on static reconfiguration, thus leading to very significant performance losses every time there is a change in the topology. Again, we developed some disruptive research showing that, although routing tables cannot be asynchronously updated without introducing deadlocks, it is possible to asynchronously perform several rounds of partial routing table reconfigurations in such a way that deadlock could be avoided. Again, a complex theory was required to prove deadlock freedom. This research opened the door to new and much more powerful network reconfiguration strategies.
  • Network congestion management techniques that eliminate the negative effects of congestion instead of eliminating congestion. In order to use networks in a cost-effective manner, the working point of the network should be close to saturation, but performance degrades dramatically when entering saturation due to congestion trees growing very quickly, thus introducing head-of-line (HOL) blocking among packets. Congestion trees were identified more than two decades ago. Many solutions have been proposed but none of them is scalable with respect to neither link bandwidth nor network size. Again, we proposed a disruptive approach here. Instead of eliminating congestion, we just eliminated the negative effects produced by congestion. Our approach is to eliminate HOL blocking, thus allowing blocked packets leading to non-congested destinations to proceed. This is achieved in a scalable manner by implementing a small set (e.g. two to four) of additional queues at every switch input port, which are dynamically allocated to congested packet flows whenever congestion is detected. By doing so, blocked packets are moved to set aside queues (SAQs) and packets leading to non-congested destinations are able to proceed, effectively eliminating HOL blocking with a small number of resources. Moreover, as our congestion management scheme reacts immediately and locally, it does not introduce the performance degradation and instability problems that typically arise in traditional congestion management techniques.

Resume of scientific work and impact of work

The following paragraphs comment on the impact of the research developed by José Duato:

  • On his own, he developed the best adaptive routing techniques for interconnection networks, also formally proving through a necessary and sufficient condition that those routing techniques cannot be improved. This theory is very easy to apply because he also proposed simple design methodologies. Moreover, these algorithms require very few extra hardware resources to be implemented, thus making adaptive routing commercially viable. As a result, these adaptive routing techniques were used in experimental machines, such as the Reliable Router and the M-Machine (developed at MIT). Moreover, these techniques were used in the Cray T3E supercomputer (the fastest supercomputer at the time it was marketed in 1997) and in the on-chip router of the Compaq Alpha 21364, the fastest commercially available microprocessor when it was launched. These routing techniques have also been used in the Cray Black Widow and the IBM BlueGene/L, a 720 Teraflop massively parallel supercomputer with more than 130,000 dual-processor nodes, which is the fastest supercomputer today. The relevance of these results comes from the fact that adaptive routing based on the theory of José Duato drastically increases supercomputer performance at roughly no extra cost by allowing a much faster communication among processors, and supercomputers are instrumental in many fields. In particular, the BlueGene/L was conceived to accelerate research on protein folding.
  • His recent research aims at setting new standards for commercial interconnection networks. In particular, he developed, in collaboration with Xyratex (a UK company), a technique called Regional Explicit Congestion Notification (RECN), which is the only truly scalable congestion management technique for lossless networks to date. The importance of this result comes from the fact that it prevents the dramatic performance degradation experienced by communication networks when they reach saturation, thus allowing higher performance, better utilization, and hence, lower cost. This result has been protected with two joint patents, and RECN is currently being incorporated into the most important standard currently under development for future communication systems: Advanced Switching Interconnect. This standard extends the functionality of the popular PCI-Express, which is used in almost every computer today. Also, several members in his research group are working on improving InfiniBand, a recently proposed industry standard for communication between processors and input/output devices as well as interprocessor communication. The main results achieved by his research group up to now are a set of InfiniBand-compatible routing algorithms that significantly improve performance over previously proposed ones while being flexible enough to be defined on any topology, a set of mechanisms to extend the InfiniBand standard to support adaptive routing, a subnet manager protocol to implement network reconfiguration when the topology changes due to hot swapping/replacement of components, and a fast algorithm to compute InfiniBand arbitration tables that supports multiple classes of service. Several of these results are currently being considered by Sun Microsystems for their next-generation InfiniBand products. Also, some of these results are being ported to Advanced Switching Interconnect.
  • He structured the knowledge amassed over the years in the area of interconnection networks by presenting the fundamental concepts and state-of-the-art techniques in his book "Interconnection Networks: An Engineering Approach", published in the USA by IEEE Computer Society Press in 1997 and by Morgan Kaufmann in 2003. This book is currently used in the leading computer companies (IBM, Compaq, Intel, Sun Microsystems) by engineers who design interconnection networks. It is also used to teach PhD courses (especially in USA Universities). It is the most popular book on interconnection networks in the market today. Most recent research papers on interconnection networks reference this book. Additionally, due to his high international visibility, he has been invited to write a chapter on interconnection networks for the fourth edition of the book "Computer Architecture: A Quantitative Approach", by John Hennessy and Dave Patterson. This chapter is now complete and the book will be published by Morgan Kaufmann in the coming months. This is by far the most popular book on computer architecture, and almost the only one used for teaching this topic worldwide.
  • He opened new research lines in the area of interconnection networks. Examples of those lines are the design of adaptive routing algorithms with cyclic dependencies between resources, the use of scalable congestion management techniques to prevent performance degradation when the network reaches saturation, and the dynamic reconfiguration of the routing algorithm to support topology changes without stopping network traffic. The first one had a significant impact on industry, as mentioned above. The second line is likely to have a tremendous impact thanks to the joint patents with Xyratex and its inclusion in the Advanced Switching Interconnect standard. The third line has already raised significant interest in academia and industry (Sun Microsystems), and it led to filing a joint patent among three research institutions: Simula Research Laboratory (Norway), University of Southern California (USA) and Universidad Politécnica de Valencia.
  • He has contributed to the state-of-the-art in practically all the topics related to interconnection networks, including the proposal of new topologies, switching techniques, flow control techniques, deadlock detection and recovery techniques, congestion management, unicast and multicast routing algorithms, techniques to compute and enhance fault tolerance, network reconfiguration, scheduling algorithms for router resources, and router design, including support for multimedia traffic.
  • He promoted the creation of a large, multidisciplinary research team in Spain, coordinated by him. In just one decade, more than 50 researchers from five Spanish universities have joined his team. This team recently passed the first evaluation stage for the Consolider-Ingenio 2010 program. Taking into account that only 35 research teams passed this stage, it can be concluded that his research team is among the 35 best research teams in Spain across all the science areas.

Publications

  • Feliu, J., Petit, S., Sahuquillo, J. & Duato, J. (2014). Cache-hierarchy Contention Aware Scheduling in CMPs. IEEE Transactions on Parallel and Distributed Systems, 25(3), 581 - 590. [More] 
  • Feliu, J., Sahuquillo, J., Petit, S. & Duato, J (2013). Planificación Considerando Degradación de Prestaciones por Contención. In XXIV Jornadas de Paralelismo, JP 2013, Madrid, Sep 17-20, pages 62-67. [More] 
  • Reaño, C., Peña, A. J., Silla, F., Mayo, R., Quintana-Ortí, E. S. & Duato, J (2013). Influence of InfiniBand FDR on the Performance of Remote GPU Virtualization. In International Conference on Cluster Computing (Cluster). [More] 
  • Feliu, J., Sahuquillo, J., Petit, S. & Duato, J (2013). L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors. In 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT'13, Edinburgh, United Kingdom, Sep 7-11, pages 123-132. [More] 
  • Feliu, J., Sahuquillo, J., Petit, S. & Duato, J (2013). Using huge pages and performance counters to determine the LLC architecture. In International Conference on Computational Science, ICCS'13, Barcelona, Jun 5-7. [More] 
  • Reaño, C., Peña, A. J., Silla, F., Mayo, R., Quintana-Ortí, E. S. & Duato, J (2012). CU2rCU: towards the Complete rCUDA Remote GPU Virtualization and Sharing Solution. In 19th Annual International Conference on High Performance Computing (HiPC). [More] 
  • Peñaranda, R., Gomez, C., Gomez, M. E., Lopez, P. & Duato, J. (2012). A New Family of Hybrid Topologies for Large-Scale Interconnection Networks. IEEE 11th International Symposium on Network Computing and Applications, 220-227. [More] 
  • Feliu, J., Sahuquillo, J., Petit, S. & Duato, J (2012). Planificació considerando el ancho de banda de la jerarquía de cache. In XIII Jornadas de Paralelismo, JP 2012, Elche, Sep 19-21, pages 472-477. [More] 
  • Feliu, J., Sahuquillo, J., Petit, S. & Duato, J (2012). Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling. In 26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, pages 508-519. [More] 
  • Hernández, C., Roca, A., Silla, F., Flich, J. & Duato, J. (2012). On the Impact of Within-Die Process Variation in GALS-Based NoC Performance. IEEE Trans. on CAD of Integrated Circuits and Systems, 31(2), 294-307. [More] 
  • Hernández, C., Silla, F. & Duato, J (2011). Energy and Performance Efficient Thread Mapping in NoC-Based CMPs under Process Variations. In Parallel Processing (ICPP), 2011 International Conference on, pages 41 -50. [More] 
  • Escudero-Sahuquillo, J., Gran, E. G., Garcia, P. J., Flich, J., Skeie, T., Lysne, O. et al (2011). Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks. In Parallel Processing (ICPP), 2011 International Conference on, pages 662 -672. [More] 
  • Duato, J., Peña, A. J., Silla, F., Mayo, R. & Quintana-Orti, E. S (2011). Performance of CUDA Virtualized Remote GPUs in High Performance Clusters. In Parallel Processing (ICPP), 2011 International Conference on, pages 365 -374. [More] 
  • Roca, A., Hernández, C., Flich, J., Silla, F. & Duato, J (2011). A Distributed Switch Architecture for On-Chip Networks. In Parallel Processing (ICPP), 2011 International Conference on, pages 21 -30. [More] 
  • Camacho Villanueva, J., Flich, J., Roca, A. & Duato, J (2011). PC-Mesh: A Dynamic Parallel Concentrated Mesh. In Parallel Processing (ICPP), 2011 International Conference on, pages 642 -651. [More] 
  • Cuesta Sáez, B., Ros, A., Gomez, M. E., Robles, A. & Duato, J (2011). Increasing the Effectiveness of Directory Caches by Deactivating Coherence for Private Memory Blocks. In 38th International Symposium on Computer Architecture (ISCA), pages 93-103. San Jose (California) : Association for Computing Machinery (ACM). [More] 
  • Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho Villanueva, J. et al. (2011). Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4), 534 -547. [More] 
  • Duato, J., Peña, A. J., Silla, F., Mayo, R. & Quintana-Ort, E. S. (2011). Enabling CUDA acceleration within virtual machines using rCUDA. Proceedings of HiPC 2011. [More] 
  • Hernández, C., Roca, A., Flich, J., Silla, F. & Duato, J. (2011). Fault-Tolerant Vertical Link Design for Effective 3D Stacking. IEEE Computer Architecture Letters, 99(RapidPosts). [More] 
  • Sem-Jacobsen, F. O., Skeie, T., Lysne, O. & Duato, J. (2011). Dynamic Fault Tolerance in Fat Trees. IEEE Transactions on Computers, 60(4), 508 - 25. [More] 
  • Gomez, C., Gomez, M. E., Lopez, P. & Duato, J. (2011). How to reduce packet dropping in a bufferless NoC. Concurrency and Computation: Practice and Experience, 23(1), 86 - 99. [More] 
  • Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho Villanueva, J. et al (2011). Cost-efficient on-chip routing implementations for CMP and MPSoC systems. In, pages 534 - 547. 445 Hoes Lane / P.O. Box 1331, Piscataway, NJ 08855-1331, United States. [More] 
  • Hernández, C., Roca, A., Flich, J., Silla, F. & Duato, J. (2011). Characterizing the impact of process variation on 45 nm NoC-based CMPs. Journal of Parallel and Distributed Computing, 71(5), 651 - 663. [More] 
  • Escudero-Sahuquillo, J., Garcia, P. J., Quiles, F. J., Flich, J. & Duato, J. (2011). Cost-effective queue schemes for reducing head-of-line blocking in fat-trees. Concurrency Computation Practice and Experience, 12(15). [More] 
  • Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho Villanueva, J. et al. (2011). Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(4), 534 - 47. [More] 

Projects

Theses

 

Sponsors

Banner
Banner
Banner
Banner
Banner
Banner
Banner