Contact

Position:
Full Professor
Address:
Valencia
Email:
This email address is being protected from spambots. You need JavaScript enabled to view it.
Phone:
+34963877007x72111

Image & Curriculum Vitae

Image & Curriculum Vitae :

Publications

  1. Blas Cuesta Sáez, Alberto Ros, Maria E Gomez, Antonio Robles and Jose Duato. Increasing the Effectiveness of Directory Caches by Deactivating Coherence for Private Memory Blocks. In 38th International Symposium on Computer Architecture (ISCA). June 2011, 93–103. URL BibTeX

    @conference{bcuesta-isca11,
    	author = "Cuesta S{\'a}ez, Blas and Ros, Alberto and Gomez, Maria E. and Robles, Antonio and Duato, Jose",
    	address = "San Jose (California)",
    	booktitle = "38th International Symposium on Computer Architecture (ISCA)",
    	isbn = "978-1-4503-0472-6",
    	month = "jun",
    	pages = "93--103",
    	publisher = "Association for Computing Machinery (ACM)",
    	title = "{I}ncreasing the {E}ffectiveness of {D}irectory {C}aches by {D}eactivating {C}oherence for {P}rivate {M}emory {B}locks",
    	url = "http://skywalker.inf.um.es/~aros/papers/bcuesta-isca11.pdf",
    	year = 2011
    }
    
  2. Alberto Ros, Blas Cuesta Sáez, Ricardo Fernández-Pascual, Maria E Gomez, Manuel E Acacio, Antonio Robles, José M García and Jose Duato. EMC^2: Extending Magny-Cours Coherence for Large-Scale Servers. In 17th Int'l Conference on High Performance Computing (HiPC) In Press, Accepted. December 2010. BibTeX

    @conference{aros-hipc10,
    	author = "Ros, Alberto and Cuesta S{\'a}ez, Blas and Ricardo Fern{\'a}ndez-Pascual and Gomez, Maria E. and Manuel E. Acacio and Robles, Antonio and Jos{\'e} M. Garc{\'i}a and Duato, Jose",
    	abstract = "The demand of larger and more powerful highperformance shared-memory servers is growing over the last few years. To meet this need, AMD has recently launched the twelve-core Magny-Cours processors. They include a directory cache (Probe Filter) that increases the scalability of the coherence protocol applied by Opterons, based on coherent HyperTransport interconnect (cHT). cHT limits up to 8 the number of nodes that can be addressed. Recent High Node Count HT specification overcomes this limitation. However, the 3-bit pointer used by the Probe Filter prevents Magny-Cours-based servers from being built beyond 8 nodes. In this paper, we propose and develop an external logic to extend the coherence domain of Magny-Cours processors beyond the 8-node limit while maintaining the advantages provided by the Probe Filter. Evaluation results for up to a 32-node system show how the performance offered by our solution scales with the increment in the number of nodes, enhancing the Probe Filter effectiveness by filtering additional messages. Particularly, we reduce runtime by 47% in a 32-die system respect to the 8-die Magny-Cours system.",
    	address = "Goa, India",
    	booktitle = "17th Int'l Conference on High Performance Computing (HiPC)",
    	month = "December",
    	title = "{EMC}^2: {E}xtending {M}agny-{C}ours {C}oherence for {L}arge-{S}cale {S}ervers",
    	volume = "In Press, Accepted",
    	year = 2010
    }
    
  3. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. A Scalable and Early Congestion Management Mechanism for MINs. In Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010. 2010, 43 - 50. URL BibTeX

    @conference{11260741,
    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Several packet marking-based mechanisms have been proposed to manage congestion in multistage interconnection networks. One of them, the MVCM mechanism obtains very good results for different network configurations and traffic loads. However, as MVCM applies full virtual output queuing at origin, its memory requirements may jeopardize its scalability. Additionally, the applied packet marking technique introduces certain delay to detect congestion. In this paper, we propose and evaluate the Scalable Early Congestion Management mechanism which eliminates the drawbacks exhibited by MVCM. The new mechanism replaces the full virtual output queuing at origin by either a partial virtual output queuing or a shared buffer, in order to reduce its memory requirements, thus making the mechanism scalable. Also, it applies an improved packet marking technique based on marking packets at output buffers regardless of their marking at input buffers, which simplifies the marking technique, allowing also a sooner detection of the root of a congestion tree.",
    	address = "Piscataway, NJ, USA",
    	booktitle = "Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010",
    	journal = "Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010)",
    	keywords = "multistage interconnection networks;",
    	note = "packet marking based mechanisms;multistage interconnection networks;MVCM mechanism;virtual output queuing;scalable early congestion management mechanism;shared buffer;",
    	pages = "43 - 50",
    	title = "{A} {S}calable and {E}arly {C}ongestion {M}anagement {M}echanism for {MIN}s",
    	url = "http://dx.doi.org/10.1109/PDP.2010.36",
    	year = 2010
    }
    
  4. Blas Cuesta Sáez, Antonio Robles and Jose Duato. Switch-based packing technique for improving token coherence scalability. 2008, 80 - 87. URL BibTeX

    @conference{20090411871352,
    	author = "Cuesta S{\'a}ez, Blas and Robles, Antonio and Duato, Jose",
    	abstract = "Traditional cache coherence protocols either provide low latency cache misses (snooping protocols) or bandwidth efficiency (directory protocols). To simultaneously capture the best attributes of traditional protocols, Token Coherence has been recently proposed. This protocol can quickly resolve cache misses by transient requests. However, since transient requests are unordered messages, they may sometimes fail in solving cache misses mainly due to the occurrence of protocol races. Thus, when the completion of cache misses is not possible by transient requests, Token Coherence uses a starvation prevention mechanism to ensure their completion. Although several implementation options of starvation prevention mechanisms have been proposed, all of them are broadcast-based. This fact represents a large detriment to the Token Coherence scalability. To tackle this problem, in this work we apply a switchbased packing technique that alleviates the harm of broadcast messages and improves the protocol scalability. © 2008 IEEE.",
    	address = "Dunedin, Otago, New zealand",
    	journal = "Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings",
    	key = "Coherent light",
    	keywords = "Multiprocessing systems;Scalability;",
    	note = "Bandwidth efficiencies;Broadcast messages;Cache coherence protocols;Cache misses;Directory protocols;Low latencies;Packing techniques;Protocol scalabilities;Token coherences;",
    	pages = "80 - 87",
    	title = "{S}witch-based packing technique for improving token coherence scalability",
    	url = "http://dx.doi.org/10.1109/PDCAT.2008.25",
    	year = 2008
    }
    
  5. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. On the influence of the packet marking and injection control schemes in congestion management for MINs. 2008, 930 - 9. URL BibTeX

    @conference{10528096,
    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Several Congestion Management Mechanisms (CMMs) have been proposed for Multistage Interconnection Networks (MINs) in order to avoid the degradation of network performance when congestion appears. Most of them are based on Explicit Congestion Notification (ECN). For this purpose, switches detect congestion and, depending on the applied mechanism, some flags are marked to warn the source hosts. In response, source hosts apply corrective actions to adjust their packet injection rate. These mechanisms have been evaluated by analyzing whether they are able to manage a congestion situation but there is not a comparison study among them. Moreover, marking effects are not separately analyzed from corrective actions. In this paper, we analyze the current proposals for CMMs, showing the impact of the applied packet marking techniques as well as the corrective actions they apply.",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2008 Parallel Processing. 14th International Euro-Par Conference",
    	keywords = "multistage interconnection networks;packet switching;telecommunication congestion control;",
    	note = "packet marking;injection control schemes;congestion management mechanisms;multistage interconnection networks;explicit congestion notification;message throttling;",
    	pages = "930 - 9",
    	title = "{O}n the influence of the packet marking and injection control schemes in congestion management for {MIN}s",
    	url = "http://dx.doi.org/10.1007/978-3-540-85451-7_100",
    	year = 2008
    }
    
  6. Blas Cuesta Sáez, Antonio Robles and Jose Duato. Improving token coherence by multicast coherence messages. 2008, 269 - 73. URL BibTeX

    @conference{9904937,
    	author = "Cuesta S{\'a}ez, Blas and Robles, Antonio and Duato, Jose",
    	abstract = "Token coherence is a cache coherence protocol that joins the main advantages of traditional protocols. However, unlike them, token coherence does not handle messages in order, which may lead to races, causing some cache misses not to be solved. To assure their completion, an inefficient mechanism named persistent requests is used. Recently we have proposed the priority request mechanism to efficiently handle races. As acknowledgements are not required, a single node can solve several misses for the same memory block at the same time. When solving a lot of misses, the node may become a bottleneck. To avoid it, in this work we propose the multicast coherence message, which allows to simultaneously resolve several misses by using only one response message. It reduces the network traffic and the average response latency, improving significantly the overall performance.",
    	address = "Piscataway, NJ, USA",
    	journal = "2008 16th Euromicro Conference on Parallel, Distributed and Network-based Processing - PDP '08",
    	keywords = "cache storage;multicast protocols;routing protocols;",
    	note = "token coherence;multicast coherence messages;cache coherence protocol;priority request mechanism;network traffic;average response latency;",
    	pages = "269 - 73",
    	title = "{I}mproving token coherence by multicast coherence messages",
    	url = "http://dx.doi.org/10.1109/PDP.2008.36",
    	year = 2008
    }
    
  7. J Forment, Francisco Gilabert, Antonio Robles, V Conejero, F Nuez and J Blanca. EST2uni: an open tool for parallel, automated EST analysis and database creation, with a powerful data mining tool. In 2nd International Conference on Bioinformatics Research and Development.. 2008, 67 - 72. BibTeX

    @conference{11172720,
    	author = "J. Forment and Gilabert, Francisco and Robles, Antonio and V. Conejero and F. Nuez and J. Blanca",
    	abstract = "We present EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses Perl to run standard EST analysis tools, and the code has a modular design to facilitate the addition of new analytical methods and their configuration: Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It uses PHP to create a Web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. The code is freely available under the GPL license and is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.",
    	address = "Linz, Austria",
    	booktitle = "2nd International Conference on Bioinformatics Research and Development.",
    	journal = "2nd International Conference on Bioinformatics Research and Development, Poster Presentations",
    	keywords = "biology computing;data mining;information retrieval;software packages;software tools;Web sites;",
    	note = "EST2uni;automated EST analysis pipeline;parallel EST analysis pipeline;database creation;data mining tool;data mining software package;Perl;microsatellite discovery;genetic marker data;cDNA microarray design;PHP;Web site;data retrieval;",
    	pages = "67 - 72",
    	title = "{EST}2uni: an open tool for parallel, automated {EST} analysis and database creation, with a powerful data mining tool",
    	year = 2008
    }
    
  8. Antonio Robles, Aurelio Bermudez, Rafael Casado, Francisco J Quiles, Tor Skeie and Jose Duato. A proposal for managing ASI fabrics. Journal of Systems Architecture 54(7):664 - 678, 2008. URL BibTeX

    @article{20083011398981,
    	author = "Robles, Antonio and Aurelio Bermudez and Rafael Casado and Francisco J. Quiles and Tor Skeie and Duato, Jose",
    	abstract = "Recent years, computer performance has been significantly increased. As a consequence, data I/O systems have become bottlenecks within systems. To alleviate this problem, Advanced Switching was recently proposed as a new standard for future interconnects. The Advanced Switching specification establishes a fabric management infrastructure, which is in charge of updating the set of fabric paths each time a topological change takes place. The use of source routing and passive switches makes unfeasible the adaptation to this new technology of many existing proposals to handle topological changes in switched interconnection networks. This paper presents a fabric management mechanism for Advanced Switching, but also suitable for other source routing interconnects. Furthermore, the work presents a detailed performance evaluation for this proposal. This evaluation allows us to identify the main drawbacks of the mechanism and to define future improvements. © 2007 Elsevier B.V. All rights reserved.",
    	address = "P.O. Box 211, Amsterdam, 1000 AE, Netherlands",
    	issn = 13837621,
    	journal = "Journal of Systems Architecture",
    	key = "Fabrics",
    	keywords = "Mechanisms;Standards;Switching circuits;Topology;",
    	note = "Advanced switching;Computer performance;Elsevier (CO);I/O systems;Management Infrastructure;New technologies;Passive switches;Performance evaluation (PE);Source routing;Topological changes;",
    	number = 7,
    	pages = "664 - 678",
    	title = "{A} proposal for managing {ASI} fabrics",
    	url = "http://dx.doi.org/10.1016/j.sysarc.2007.12.002",
    	volume = 54,
    	year = 2008
    }
    
  9. Joan-Lluis Ferrer, Elvira Baydal, Antonio Robles, Pedro Lopez and Jose Duato. Congestion management in MINs through marked validated packets. 2007, 260 - 7. BibTeX

    @conference{10266202,
    	author = "Ferrer, Joan-Lluis and Baydal, Elvira and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = {Congestion management is a very critical problem tackled in interconnection networks for years but not solved yet. Although several mechanisms have been recently proposed for lossless multistage interconnection networks (MINs), they either have drawbacks or are partial solutions. Some of them introduce penalty over packets not really addressed to the hot-spots, whereas others can cope only with congestion situations that last a short time. In this paper, we propose an effective and efficient congestion management mechanism for lossless interconnection networks based on explicit congestion notification. The mechanism uses two different flags in ACK packets, a Marking Bit (MB) and a Validation Bit (VB), to detect congestion and warn the origin hosts. In this way, packets belonging to "coldflows" but stopped because of head-of-line (HOL) blocking can be distinguished from "hotflow" packets which are really causing congestion. In response, origin hosts can apply corrective actions only to the "hotflows", minimizing the negative impact on "coldflows"performance. Evaluation results show that the proposed congestion management strategy is able to avoid the degradation of network performance, regardless of traffic load and the location of the congestion in the network.},
    	address = "Piscataway, NJ, USA",
    	journal = "15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07)",
    	keywords = "multistage interconnection networks;",
    	note = "congestion management;lossless multistage interconnection network;validated packet;marked packet;ACK packet;head-of-line blocking;marking bit;validation bit;",
    	pages = "260 - 7",
    	title = "{C}ongestion management in {MIN}s through marked validated packets",
    	year = 2007
    }
    
  10. Hilario Lopez, Antonio Robles, Ivan Machon, Eva Fernandez and Luis Fernando Sancho. Temperature monitoring system in the mould of a slab continuous casting line. In 2007 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, PROCEEDINGS, VOLS 1-8. 2007, 175-179. BibTeX

    @conference{ISI:000252265100032,
    	author = "Hilario Lopez and Robles, Antonio and Ivan Machon and Eva Fernandez and Luis Fernando Sancho",
    	abstract = "In this article a study is introduced that was carried out for the implementation of a temperature monitoring system in the mould of a slab continuous casting line in ACERALIA's LD3 steel factory in Aviles (Asturias). To achieve this, instrumentation has been proposed consisting on precision thermocouples placed along the vertical mid-line of both the broad face and the narrow face of the mould. Signals are converted and sent through an industrial bus to the acquisition station. Here, the data from the process computer (the conditions under which the casting develops) is also stored. The ultimate objective is the retrieval of actual data on temperatures at specific locations of the mould. These data can be used for the adjustment of models of mould operative behaviour.",
    	booktitle = "2007 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, PROCEEDINGS, VOLS 1-8",
    	isbn = "978-1-4244-0754-5",
    	note = "IEEE International Symposium on Industrial Electronics, Vigo, SPAIN, JUN 04-07, 2007",
    	pages = "175-179",
    	title = "{T}emperature monitoring system in the mould of a slab continuous casting line",
    	year = 2007
    }
    
  11. Blas Cuesta Sáez, Antonio Robles and Jose Duato. Improving token coherence by Multicast Coherence Messages. In D ElBaz, J Bourgeois and F Spies (eds.). PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING. 2007, 269-273. BibTeX

    @conference{ISI:000254266500036,
    	author = "Cuesta S{\'a}ez, Blas and Robles, Antonio and Duato, Jose",
    	abstract = "Token Coherence is a cache coherence protocol that joins the main advantages of traditional protocols. However, unlike them, Token Coherence does not handle messages in order, which may lead to races, causing some cache misses not to be solved To assure their completion, an inefficient mechanism named persistent requests is used Recently we have proposed the priority request mechanism to efficiently handle races. As acknowledgements are not required, a single node can solve several misses for the same memory block at the same time. When solving a lot of misses, the node may become a bottleneck. To avoid it, in this work we propose the Multicast Coherence Message, which allows to simultaneously resolve several misses by using only one response message. It reduces the network traffic and the average response latency, improving significantly the overall performance.",
    	booktitle = "PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING",
    	editor = "ElBaz, D and Bourgeois, J and Spies, F",
    	isbn = 9780769530895,
    	issn = "1066-6192",
    	note = "16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Toulouse, FRANCE, FEB 13-15, 2008",
    	pages = "269-273",
    	series = "Euromicro Workshop on Parallel and Distributed Processing",
    	title = "{I}mproving token coherence by {M}ulticast {C}oherence {M}essages",
    	year = 2007
    }
    
  12. Blas Cuesta Sáez, Antonio Robles and Jose Duato. An effective starvation avoidance mechanism to enhance the token coherence protocol. In P DAmbra and MR Guarracino (eds.). 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, Proceedings. 2007, 47-54. BibTeX

    @conference{ISI:000245942700007,
    	author = "Cuesta S{\'a}ez, Blas and Robles, Antonio and Duato, Jose",
    	abstract = "Shared-memory multiprocessors are becoming to be formed by an increasingly larger number of nodes. In these systems, implementing cache coherence is a key issue. Token Coherence is a low latency cache coherence protocol that avoids indirection for cache-to-cache misses and which does not require a totally-ordered interconnect. When races are rare, the protocol performs well thanks to the performance policy. Unfortunately, some medium/large systems and some applications that often access the same data simultaneously make races more common. As a result, the protocol does not perform as well as it could because it uses the persistent request mechanism to prevent starvation. This mechanism is too slow and inflexible because it overrides the performance policy. In consequence, the protocol slows down the system and does not take advantage of the flexibility and speed of the common case. We propose a new mechanism, namely priority requests, which replaces the persistent request one. Our mechanism solves races, while still respecting the performance policy, simply by ordering and giving a higher priority to requests suffering from starvation. Thus, our mechanism handles the tokens more efficiently and reduces the network traffic.",
    	booktitle = "15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, Proceedings",
    	editor = "DAmbra, P and Guarracino, MR",
    	isbn = 9780769527840,
    	note = "15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Naples, ITALY, FEB 07-09, 2007",
    	pages = "47-54",
    	title = "{A}n effective starvation avoidance mechanism to enhance the token coherence protocol",
    	year = 2007
    }
    
  13. Maria E Gomez, N A Nordbotten, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, T Skeie and O Lysne. A routing methodology for achieving fault tolerance in direct networks. Computers, IEEE Transactions on 55(4):400 - 415, April 2006. URL, DOI BibTeX

    @article{1608003,
    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The nterconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance.",
    	doi = "10.1109/TC.2006.46",
    	issn = "0018-9340",
    	journal = "Computers, IEEE Transactions on",
    	keywords = "adaptive routing; checkpoint-restart mechanism; direct networks; fault-tolerant routing methodology; interconnection network; parallel computing system; fault tolerant computing; multiprocessor interconnection networks; network routing; parallel processi",
    	month = "april",
    	number = 4,
    	pages = "400 - 415",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
    	url = "http://dx.doi.org/10.1109/TC.2006.46",
    	volume = 55,
    	year = 2006
    }
    
  14. Maria E Gomez, N A Nordbotten, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, T Skeie and O Lysne. A routing methodology for achieving fault tolerance in direct networks. IEEE Transactions on Computers 55(4):400 - 15, 2006. URL, DOI BibTeX

    @article{8935111,
    	author = "Gomez, Maria E. and N.A. Nordbotten and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and T. Skeie and O. Lysne",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance",
    	address = "USA",
    	doi = "10.1109/TC.2006.46",
    	issn = "0018-9340",
    	journal = "IEEE Transactions on Computers",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;network routing;parallel processing;",
    	note = "direct networks;parallel computing system;interconnection network;fault-tolerant routing methodology;adaptive routing;checkpoint-restart mechanism;",
    	number = 4,
    	pages = "400 - 15",
    	title = "{A} routing methodology for achieving fault tolerance in direct networks",
    	url = "http://dx.doi.org/10.1109/TC.2006.46",
    	volume = 55,
    	year = 2006
    }
    
  15. J M Montañana, Jose Flich, Antonio Robles and Jose Duato. Reachability-based fault-tolerant routing. In Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on 1. 2006, 10 pp.. URL, DOI BibTeX

    @conference{1655699,
    	author = "Monta{\~n}ana, J. M. and Flich, Jose and Robles, Antonio and Duato, Jose",
    	abstract = "Clusters of PCs are being used as cost-effective alternative to large parallel computers. In most of them it is critical to keep the system running even in the presence of faults. As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault-tolerance in the system, in general, and in the interconnection network, in particular, plays a key role. An interesting approach to provide fault-tolerance consists of migrating on fly the paths affected by the failure to new fault-free paths. In this paper, we propose a simple and effective fault-tolerant routing methodology, referred to as reachability based fault tolerant routing (RFTR), that can be applied to any topology. RFTR builds new alternative paths by joining subpaths extracted from the set of already computed paths, thus being time-efficient. In order to avoid deadlocks, RFTR performs, if required, a virtual channel transition on the subpath union. As an example of applicability, in this paper we apply RFTR to InfiniBand. Evaluation results on tori show that RFTR exhibits a low computation cost and does not degrade performance significantly",
    	booktitle = "Parallel and Distributed Systems, 2006. ICPADS 2006. 12th International Conference on",
    	doi = "10.1109/ICPADS.2006.89",
    	isbn = "0-7695-2612-8",
    	issn = "1521-9097",
    	keywords = "PC clusters;interconnection network;parallel computers;reachability-based fault-tolerant routing;virtual channel transition;fault tolerant computing;reachability analysis;telecommunication network routing;workstation clusters;",
    	month = "0-0",
    	pages = "10 pp.",
    	title = "{R}eachability-based fault-tolerant routing",
    	url = "http://dx.doi.org/10.1109/ICPADS.2006.89",
    	volume = 1,
    	year = 2006
    }
    
  16. Michihiro Koibuchi, Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Enforcing in-order packet delivery in system area networks with adaptive routing. Journal of Parallel and Distributed Computing 65(10):1223 - 1236, 2005. URL BibTeX

    @article{2005379355213,
    	author = "Michihiro Koibuchi and Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Adaptive routing, which dynamically selects the route of packets, has been widely studied for interconnection networks in massively parallel computers and system area networks. Although adaptive routing has the advantage of providing high bandwidth, it may deliver packets out-of-order, which some message passing libraries do not accept. In this paper, we propose two mechanisms called (1) FIFO transmission and (2) couple limitation to guarantee in-order packet delivery in adaptive routing. Both of them limit packet injection at source hosts. The FIFO transmission completely avoids packet sorting at destination hosts, while the couple limitation uses a few buffers to sort packets at destination hosts. Evaluation results show that the FIFO transmission and the couple limitation achieve a similar throughput to that of a method equipped with huge (infinite) buffers enough to store all out-of-order packets at destination hosts under both synthetic traffic and NAS Parallel Benchmarks. © 2005 Elsevier Inc. All rights reserved.",
    	issn = 07437315,
    	journal = "Journal of Parallel and Distributed Computing",
    	key = "Packet networks",
    	keywords = "Bandwidth;Benchmarking;Interconnection networks;Routers;Telecommunication traffic;",
    	note = "Adaptive routing;In-order packet delivery;PC clusters;System area networks;",
    	number = 10,
    	pages = "1223 - 1236",
    	title = "{E}nforcing in-order packet delivery in system area networks with adaptive routing",
    	url = "http://dx.doi.org/10.1016/j.jpdc.2005.04.007",
    	volume = 65,
    	year = 2005
    }
    
  17. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez, Jose Duato and M Koibuchi. In-Order Packet Delivery in Interconnection Networks using Adaptive Routing. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. 2005, 101 - 101. DOI BibTeX

    @conference{1419928,
    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose and M. Koibuchi",
    	abstract = "Most commercial switch-based network technologies for PC clusters use deterministic routing. Alternatively, adaptive routing could be used to improve network performance. In this case, switches decide the path to reach the destination by using local information about the state of the possible outgoing links. However, there are two drawbacks that discourage adaptive routing from being applied to commercial interconnects. The first one concerns the possible switch complexity increase with respect to deterministic routing. The second drawback is due to the fact that adaptive routing may introduce out-of-order packet delivery, which is not acceptable for some applications. For the best of our knowledge, there are no works that analyze the degree of out-of-order packet delivery caused by different network and traffic conditions. In this paper, we take on such a challenge. We show that only for high traffic conditions (reaching saturation) out-of-order delivery is introduced. Moreover, by using small buffers and simple sorting mechanisms at destination, we show that high network throughput can be obtained at the same time packets are delivered in order. Thus, the paper demonstrates that it is possible to use adaptive routing, while still guaranteeing in-order packet delivery, without using large buffer resources nor degrading significantly its performance.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International",
    	doi = "10.1109/IPDPS.2005.255",
    	keywords = "PC clusters; adaptive routing; deterministic routing; interconnection networks; out-of-order packet delivery; sorting mechanisms; switch-based network technologies; multiprocessor interconnection networks; network routing; packet switching; sorting; work",
    	month = "04-08",
    	pages = "101 - 101",
    	title = "{I}n-{O}rder {P}acket {D}elivery in {I}nterconnection {N}etworks using {A}daptive {R}outing",
    	year = 2005
    }
    
  18. Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, N A Nordbotten, O Lysne and T Skeie. An effective fault-tolerant routing methodology for direct networks. In Parallel Processing, 2004. ICPP 2004. International Conference on. 2004, 222 - 231 vol.1. URL, DOI BibTeX

    @conference{1327925,
    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomez proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance.",
    	booktitle = "Parallel Processing, 2004. ICPP 2004. International Conference on",
    	doi = "10.1109/ICPP.2004.1327925",
    	issn = "0190-3918",
    	keywords = "direct networks; fault-tolerant routing algorithm; in-depth detailed analysis; interconnection networks; minimal adaptive routing; parallel computing system; communication complexity; fault tolerant computing; multiprocessor interconnection networks; par",
    	month = "aug.",
    	pages = "222 - 231 vol.1",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
    	url = "http://dx.doi.org/10.1109/ICPP.2004.1327925",
    	year = 2004
    }
    
  19. JC Sancho, Antonio Robles and Jose Duato. An effective methodology to improve the performance of the Up*/down* routing algorithm. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 15(8):740-754, August 2004. BibTeX

    @article{ISI:000222073200006,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Most NOWs are arranged as a switch-based network and provide mechanisms for discovering the network topology. Hence, they provide support for both regular and irregular topologies, which makes routing and deadlock avoidance quite complicated. Current proposals use the Up{*}/down{*} routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow nonminimal paths, increasing latency and wasting resources. In this work, we propose and evaluate a simple and effective methodology to compute Up{*}/down{*} routing tables. The new methodology is based on computing a depth-first search (DFS) spanning tree on the network graph that decreases the number of routing restrictions with respect to the breadth-first search (BFS) spanning tree used by the traditional methodology. Additionally, we propose different heuristic rules for computing the spanning trees to improve the efficiency of Up{*}/down{*} routing. Evaluation results for several different topologies show that computing the Up{*}/down{*} routing tables by using the new methodology increases throughput by a factor of up to 2.48 in large networks with respect to the traditional methodology, and also reduces latency significantly.",
    	issn = "1045-9219",
    	journal = "IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS",
    	month = "AUG",
    	number = 8,
    	pages = "740-754",
    	title = "{A}n effective methodology to improve the performance of the {U}p{*}/down{*} routing algorithm",
    	volume = 15,
    	year = 2004
    }
    
  20. J M Montañana, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. A transition-based fault-tolerant routing methodology for InfiniBand networks. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International. April 2004, 186. URL, DOI BibTeX

    @conference{1303198,
    	author = "Monta{\~n}ana, J. M. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Summary form only given. Currently, clusters of PCs are considered a cost-effective alternative to large parallel computers. As the number of elements increases in these systems, the probability of faults increases dramatically. Therefore, it is critical to keep the system running even in the presence of faults. The interconnection network plays a key role in its performance. InfiniBand (IBA) is a new standard interconnect suitable for clusters. Most of the fault-tolerant routing strategies proposed for massively parallel computers cannot be applied to IBA because routing and virtual channel transitions are deterministic, which prevents packets from avoiding the faults. A possible approach to provide fault-tolerance in IBA consists of using several disjoint paths between every source-destination pair of nodes and selecting the appropriate path at the source host. However, to this end, a routing algorithm able to provide enough disjoint paths, while still guaranteeing deadlock freedom, is required. We propose a simple and effective fault-tolerant methodology for IBA networks that can be applied to any network topology and meets the trade-off between fault-tolerance degree and the number of network resources devoted to it. Preliminary results show that the proposed methodology scales well and supports up to three faults in 2D and five in 3D tori using only two virtual channels.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International",
    	doi = "10.1109/IPDPS.2004.1303198",
    	isbn = "0-7695-2132-0",
    	issn = "",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;network topology;parallel machines;telecommunication network routing;workstation clusters;",
    	month = "april",
    	pages = 186,
    	title = "{A} transition-based fault-tolerant routing methodology for {I}nfini{B}and networks",
    	url = "http://dx.doi.org/10.1109/IPDPS.2004.1303198",
    	year = 2004
    }
    
  21. Maria E Gomez, Jose Duato, Jose Flich, Pedro Lopez, Antonio Robles, N A Nordbotten, O Lysne and T Skeie. An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori. Computer Architecture Letters 3(1):3 - 3, 2004. URL, DOI BibTeX

    @article{1650124,
    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and supports a reasonably large number of faults without significantly degrading performance. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, at this node, without being ejected, they are adaptively forwarded to their destinations. In order to allow deadlock-free minimal adaptive routing, the methodology requires only one additional virtual channel (for a total of three), even for tori. Evaluation results for a 4 x 4 x 4 torus network show that the methodology is 5-fault tolerant. Indeed, for up to 14 link failures, the percentage of fault combinations supported is higher than 99.96%. Additionally, network throughput degrades by less than 10% when injecting three random link faults without disabling any node. In contrast, a mechanism similar to the one proposed in the BlueGene/L, that disables some network planes, would strongly degrade network throughput by 79%.",
    	doi = "10.1109/L-CA.2004.1",
    	issn = "1556-6056",
    	journal = "Computer Architecture Letters",
    	month = "january-december",
    	number = 1,
    	pages = "3 - 3",
    	title = "{A}n {E}fficient {F}ault-{T}olerant {R}outing {M}ethodology for {M}eshes and {T}ori",
    	url = "http://dx.doi.org/10.1109/L-CA.2004.1",
    	volume = 3,
    	year = 2004
    }
    
  22. Jose C Sancho, Antonio Robles and Jose Duato. An effective methodology to improve the performance of the up*/down* routing algorithm. IEEE Transactions on Parallel and Distributed Systems 15(8):740 - 754, 2004. URL BibTeX

    @article{2004368344586,
    	author = "Jose C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Most NOWs are arranged as a switch-based network and provide mechanisms for discovering the network topology. Hence, they provide support for both regular and irregular topologies, which makes routing and deadlock avoidance quite complicated. Current proposals use the Up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow nonminimal paths, increasing latency and wasting resources. In this work, we propose and evaluate a simple and effective methodology to compute Up*/down* routing tables. The new methodology is based on computing a depth-first search (DFS) spanning tree on the network graph that decreases the number of routing restrictions with respect to the breadth-first search (BFS) spanning tree used by the traditional methodology. Additionally, we propose different heuristic rules for computing the spanning trees to improve the efficiency of Up*/down* routing. Evaluation results for several different topologies show that computing the Up*/down* routing tables by using the new methodology increases throughput by a factor of up to 2.48 in large networks with respect to the traditional methodology, and also reduces latency significantly. © 2004 IEEE.",
    	issn = 10459219,
    	journal = "IEEE Transactions on Parallel and Distributed Systems",
    	key = "Computer networks",
    	keywords = "Algorithms;Computer simulation;Computer system recovery;Interconnection networks;Parallel processing systems;Trees;",
    	note = "Deadlock avoidance;Irregular topologies;Routing algorithms;Spanning tree;",
    	number = 8,
    	pages = "740 - 754",
    	title = "{A}n effective methodology to improve the performance of the up*/down* routing algorithm",
    	url = "http://dx.doi.org/10.1109/TPDS.2004.28",
    	volume = 15,
    	year = 2004
    }
    
  23. J C Sancho, Antonio Robles and Jose Duato. An effective methodology to improve the performance of the up*/down* routing algorithm. IEEE Transactions on Parallel and Distributed Systems 15(8):740 - 54, 2004. URL BibTeX

    @article{8115437,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Most NOWs are arranged as a switch-based network and provide mechanisms for discovering the network topology. Hence, they provide support for both regular and irregular topologies, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow nonminimal paths, increasing latency and wasting resources. We propose and evaluate a simple and effective methodology to compute up*/down* routing tables. The new methodology is based on computing a depth-first search (DPS) spanning tree on the network graph that decreases the number of routing restrictions with respect to the breadth-first search (BFS) spanning tree used by the traditional methodology. Additionally, we propose different heuristic rules for computing the spanning trees to improve the efficiency of up*/down* routing. Evaluation results for several different topologies show that computing the up*/down* routing tables by using the new methodology increases throughput by a factor of up to 2.48 in large networks with respect to the traditional methodology, and also reduces latency significantly",
    	address = "USA",
    	issn = "1045-9219",
    	journal = "IEEE Transactions on Parallel and Distributed Systems",
    	keywords = "concurrency theory;network operating systems;network topology;telecommunication network routing;tree searching;workstation clusters;",
    	note = "up*/down* routing algorithm;networks of workstations;depth-first search spanning tree;network graph;breadth-first search;irregular topologies;deadlock avoidance;",
    	number = 8,
    	pages = "740 - 54",
    	title = "{A}n effective methodology to improve the performance of the up*/down* routing algorithm",
    	url = "http://dx.doi.org/10.1109/TPDS.2004.28",
    	volume = 15,
    	year = 2004
    }
    
  24. Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, Jose Duato, N A Nordbotten, O Lysne and T Skeie. An effective fault-tolerant routing methodology for direct networks. 2004, 222 - 31. BibTeX

    @conference{8279975,
    	author = "Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose and N.A. Nordbotten and O. Lysne and T. Skeie",
    	abstract = "Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomex proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance",
    	address = "Los Alamitos, CA, USA",
    	journal = "2004 International Conference on Parallel Processing",
    	keywords = "communication complexity;fault tolerant computing;multiprocessor interconnection networks;parallel processing;",
    	note = "parallel computing system;fault-tolerant routing algorithm;interconnection networks;minimal adaptive routing;in-depth detailed analysis;direct networks;",
    	pages = "222 - 31",
    	title = "{A}n effective fault-tolerant routing methodology for direct networks",
    	volume = "vol.1",
    	year = 2004
    }
    
  25. Maria E Gomez, Jose Duato, Jose Flich, Pedro Lopez, Antonio Robles, N A Nordbotten, T Skeie and O Lysne. A new adaptive fault-tolerant routing methodology for direct networks. 2004, 462 - 73. BibTeX

    @conference{8426282,
    	author = "Gomez, Maria E. and Duato, Jose and Flich, Jose and Lopez, Pedro and Robles, Antonio and N.A. Nordbotten and T. Skeie and O. Lysne",
    	abstract = "Interconnection networks play a key role in the fault tolerance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to different regular topologies. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodology requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults",
    	address = "Berlin, Germany",
    	journal = "High Performance Computing-HiPC 2004. 11th International Conference (Lecture notes in Computer Science Vol.3296)",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;parallel processing;telecommunication network routing;telecommunication network topology;",
    	note = "adaptive fault-tolerant routing;direct interconnection networks;massively parallel computers;",
    	pages = "462 - 73",
    	title = "{A} new adaptive fault-tolerant routing methodology for direct networks",
    	year = 2004
    }
    
  26. JE Villalobos, JL Sanchez, JA Gamez, JC Sancho and Antonio Robles. A methodology to evaluate the effectiveness of traffic balancing algorithms. In M Danelutto, D Laforenza and M Vanneschi (eds.). EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS 3149. 2004, 891-899. BibTeX

    @conference{ISI:000223792500118,
    	author = "JE Villalobos and JL Sanchez and JA Gamez and JC Sancho and Robles, Antonio",
    	abstract = "Traffic balancing algorithms represent a cost-effective alternative to balance traffic in high performance interconnection networks. The importance of these algorithms is increasing since most of the current network technologies for clusters are either based on source routing or use deterministic routing. In source-routed networks, the host is responsible for selecting the suitable path among the set of paths provided by the routing algorithm. The selection of an optimal path that maximizes the channel utilization is not trivial because of the huge amount of combinations. Traffic balancing algorithms are based on heuristics in order to find an optimal solution. In this paper, we propose a new methodology based on the use of metaheuristic algorithms to evaluate the effectiveness of traffic balancing algorithms. Preliminary results show that the set of paths provided by current traffic balancing algorithms are still far from an optimized solution. Thus, it is worth continuing to design more efficient traffic balancing algorithms.",
    	booktitle = "EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS",
    	editor = "Danelutto, M and Laforenza, D and Vanneschi, M",
    	isbn = 3540229248,
    	issn = "0302-9743",
    	note = "10th International Euro-Par Conference on Parallel Processing, Pisa, ITALY, 2004",
    	pages = "891-899",
    	series = "LECTURE NOTES IN COMPUTER SCIENCE",
    	title = "{A} methodology to evaluate the effectiveness of traffic balancing algorithms",
    	volume = 3149,
    	year = 2004
    }
    
  27. T Skeie, O Lysne, Jose Flich, Pedro Lopez, Antonio Robles and Jose Duato. LASH-TOR: a generic transition-oriented routing algorithm. In Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on. 2004, 595 - 604. URL, DOI BibTeX

    @conference{1316144,
    	author = "T. Skeie and O. Lysne and Flich, Jose and Lopez, Pedro and Robles, Antonio and Duato, Jose",
    	abstract = "Cluster networks are seen as the future access networks for multimedia streaming, e-commerce, network storage, etc. For these applications, performance and high availability are particularly crucial. Regular topologies are preferred when performance is the primary concern. However, due to spatial constraints or fault-related issues, the network structure may become irregular, which makes more difficult to find deadlock-free minimal paths. Over the recent years, several solutions have been proposed. One of them is the LASH routing, which enables minimal routing by assigning paths to different virtual layers. In this paper, we propose an extension of LASH in order to reduce the number of required virtual layers by allowing transitions between virtual layers. Evaluation results show that the new routing scheme (LASH-TOR) is able to obtain full minimal routing with a reduced number of virtual channels. For torus and mesh networks, with only two virtual channels, LASH throughput is increased by an average factor of improvement of 3.30 for large networks. For regular networks with some unconnected (faulty) links, equal performance improvements are achieved. Even for highly irregular networks of size up to 128 switches the new routing scheme only needs three virtual channels for guaranteeing minimal routing. Besides, LASH-TOR performs well compared to dimension order routing for mesh and torus networks.",
    	booktitle = "Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on",
    	doi = "10.1109/ICPADS.2004.1316144",
    	isbn = "0-7695-2152-5",
    	issn = "1521-9097",
    	keywords = "LASH routing; LASH-TOR; access networks; cluster networks; deadlock-free minimal paths; e-commerce; mesh network; multimedia streaming; network storage; network structure; spatial constraints; torus network; transition-oriented routing algorithm; virtual",
    	month = "7-9",
    	pages = "595 - 604",
    	title = "{LASH}-{TOR}: a generic transition-oriented routing algorithm",
    	url = "http://dx.doi.org/10.1109/ICPADS.2004.1316144",
    	year = 2004
    }
    
  28. N A Nordbotten, Maria E Gomez, Jose Flich, Pedro Lopez, Antonio Robles, T Skeie, O Lysne and Jose Duato. A fully adaptive fault-tolerant routing methodology based on intermediate nodes. 2004, 341 - 56. BibTeX

    @conference{8322959,
    	author = "N.A. Nordbotten and Gomez, Maria E. and Flich, Jose and Lopez, Pedro and Robles, Antonio and T. Skeie and O. Lysne and Duato, Jose",
    	abstract = "Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the presence of failures. Interconnection networks play a key-role in these systems, and this paper proposes a fault-tolerant routing methodology for use in such networks. The methodology supports any minimal routing function (including fully adaptive routing), does not degrade performance in the absence of faults, does not disable any healthy node, and is easy to implement both in meshes and tori. In order to avoid network failures, the methodology uses a simple mechanism: for some source-destination pairs, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network). The methodology is shown to tolerate a large number of faults (e.g., five/nine faults when using two/three intermediate nodes in a 3D torus). Furthermore, the methodology offers a gracious performance degradation: in an 8 × 8 × 8 torus network with 14 faults the throughput is only decreased by 6.49%",
    	address = "Germany, Germany",
    	journal = "Network and Parallel Computing. IFIP International Conference, NPC 2004. Proceedings (Lecture Notes in Computer Science Vol.3222)",
    	keywords = "fault tolerant computing;multiprocessor interconnection networks;packet switching;parallel processing;telecommunication network routing;",
    	note = "fully adaptive fault-tolerant routing;intermediate nodes;massively parallel computing systems;interconnection networks;minimal routing function;network failures;source-destination pairs;",
    	pages = "341 - 56",
    	title = "{A} fully adaptive fault-tolerant routing methodology based on intermediate nodes",
    	year = 2004
    }
    
  29. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting fully adaptive routing in InfiniBand networks. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. April 2003, 10 pp.. URL, DOI BibTeX

    @conference{1213130,
    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed routing. However, routing in IBA is deterministic because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that support adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be enabled or disabled individually for each packet at the source node. Also, the proposed strategy enables the use in IBA of fully adaptive routing algorithms without using additional network resources to improve network performance. Evaluation results show that extending IBA switch capabilities with fully adaptive routing noticeably increases network performance. In particular, network throughput increases up to an average factor of 3.9.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2003. Proceedings. International",
    	doi = "10.1109/IPDPS.2003.1213130",
    	issn = "1530-2075",
    	keywords = "InfiniBand networks; distributed routing; fully adaptive routing; interprocessor communication; network performance; network throughput; processing nodes; computer networks; multiprocessor interconnection networks; performance evaluation;",
    	month = "april",
    	pages = "10 pp.",
    	title = "{S}upporting fully adaptive routing in {I}nfini{B}and networks",
    	url = "http://dx.doi.org/10.1109/IPDPS.2003.1213130",
    	year = 2003
    }
    
  30. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting adaptive routing in InfiniBand networks. In Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on. 2003, 165 - 172. URL, DOI BibTeX

    @conference{1183583,
    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed deterministic routing because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper we propose a simple strategy to implement forwarding tables for IBA switches that supports adaptive routing while still maintaining compatibility with the IBA specifications. Adaptive routing can be individually enabled or disabled for each packet at the source node. The proposed strategy enables the use in IBA of any adaptive routing algorithm with an acyclic channel dependence graph. In this paper, we have taken advantage of the partial adaptivity provided by the well-known up*/down* routing algorithm. Evaluation results show that extending IBA switch capabilities with adaptive routing may noticeably increase network performance. In particular network throughput improvement can be, on average, as high as 46%.",
    	booktitle = "Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on",
    	doi = "10.1109/EMPDP.2003.1183583",
    	issn = "1066-6192",
    	keywords = "I-O devices; IBA switches; InfiniBand Architecture; InfiniBand networks; acyclic channel dependence graph; adaptive routing; deterministic routing; forwarding tables; interprocessor communication; network performance; network throughput; processing node",
    	month = "feb.",
    	pages = "165 - 172",
    	title = "{S}upporting adaptive routing in {I}nfini{B}and networks",
    	url = "http://dx.doi.org/10.1109/EMPDP.2003.1183583",
    	year = 2003
    }
    
  31. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting fully adaptive routing in InfiniBand networks. 2003, 10 pp. -. URL BibTeX

    @conference{7891311,
    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed routing. However, routing in IBA is deterministic because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that support adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be enabled or disabled individually for each packet at the source node. Also, the proposed strategy enables the use in IBA of fully adaptive routing algorithms without using additional network resources to improve network performance. Evaluation results show that extending IBA switch capabilities with fully adaptive routing noticeably increases network performance. In particular, network throughput increases up to an average factor of 3.9",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings International Parallel and Distributed Processing Symposium",
    	keywords = "computer networks;multiprocessor interconnection networks;performance evaluation;",
    	note = "fully adaptive routing;InfiniBand networks;processing nodes;interprocessor communication;distributed routing;network performance;network throughput;",
    	pages = "10 pp. -",
    	title = "{S}upporting fully adaptive routing in {I}nfini{B}and networks",
    	url = "http://dx.doi.org/10.1109/IPDPS.2003.1213130",
    	year = 2003
    }
    
  32. Maria E Gomez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. VOQSW: a methodology to reduce HOL blocking in InfiniBand networks. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. 2003, 10 pp.. DOI BibTeX

    @conference{1213134,
    	author = "Gomez, Maria E. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new switch-based standard interconnect for communication between processor nodes and I/O devices as well as for interprocessor communication. InfiniBand architecture allows switches to support up to 15 virtual lanes per port for data traffic. To route packets through a given virtual lane (VL), packets are labeled with a certain service level (SL) at injection time, and SLtoVL mapping tables are used at each switch to determine the VL to be used. Many previous works in the literature have shown that separate virtual lanes are able to reduce the influence of the well-known head-of-line (HOL) blocking effect on network performance. However, using virtual lanes to form separate virtual networks is not enough to eliminate the HOL blocking problem. Alternative solutions such as Virtual Output Queuing (VOQ) are able to eliminate it at the expense of modifying the switch buffer organization. In this paper, we propose an effective strategy to implement the VOQ scheme in IBA switches by using virtual lanes. This strategy does not require to modify the switch architecture, simply SL to VL tables must be properly filled. Evaluation results show that our proposed VOQ scheme is able to outperform the results obtained with the virtual network approach using the same number of resources. Moreover, the methodology proposed to implement the VOQ scheme in IBA only requires a small number of resources in order to significantly improve network throughput.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2003. Proceedings. International",
    	doi = "10.1109/IPDPS.2003.1213134",
    	keywords = "HOL blocking; InfiniBand networks; SL to VL mapping tables; head-of-line blocking effect; interprocessor communication; network performance; network throughput; switch buffer organization; switch-based standard interconnect; virtual lane; virtual output",
    	month = "22-26",
    	pages = "10 pp.",
    	title = "{VOQSW}: a methodology to reduce {HOL} blocking in {I}nfini{B}and networks",
    	year = 2003
    }
    
  33. JC Sancho, Antonio Robles, Pedro Lopez, Jose Flich and Jose Duato. Routing in InfiniBand (TM) torus network topologies. In P Sadayappan and CS Yang (eds.). 2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS. 2003, 509-518. BibTeX

    @conference{ISI:000186828800056,
    	author = "JC Sancho and Robles, Antonio and Lopez, Pedro and Flich, Jose and Duato, Jose",
    	abstract = "InfiniBand is an interconnect standard for communication between processing nodes and I/O devices as well as for interprocessor communication (NOWs). The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology can be established by the customer When the performance is the primary concern regular topologies are preferred. Low-dimensional tori (2D and 3D) are some of the regular topologies most widely used in commercial parallel computers. Routing in torus requires the use of virtual channels. Although InfiniBand provides support for deterministic routing and virtual channels, they are selected at each switch by service level (SL) identifiers associated to packets and do not depend on packet destination. This makes routing algorithm implementation more complex. In particular, a large number of SLs may be required, which is a scarce resource. In this paper we analyze the way several routing strategies can be applied in tori InfiniBand networks, also evaluating their resource requirements. In particular, we analyze and compare the well-known e-cube and up{*}/down{*} routing algorithms and the Flexible routing algorithm recently proposed.",
    	booktitle = "2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS",
    	editor = "Sadayappan, P and Yang, CS",
    	isbn = 0769520170,
    	note = "International Conference on Parallel Processing, KAOHSIUNG, TAIWAN, OCT 06-09, 2003",
    	pages = "509-518",
    	title = "{R}outing in {I}nfini{B}and ({TM}) torus network topologies",
    	year = 2003
    }
    
  34. Juan Carlos Martinez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Supporting adaptive routing in IBA switches. 2003, 441 - 456. URL BibTeX

    @conference{2003487758791,
    	author = "Martinez, Juan Carlos and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "InfiniBand is a new standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InfiniBand Architecture (IBA) supports distributed deterministic routing because forwarding tables store a single output port per destination ID. This prevents packets from using alternative paths when the requested output port is busy. Despite the fact that alternative paths could be selected at the source node to reach the same destination node, this is not effective enough to improve network performance. However, using adaptive routing could help to circumvent the congested areas in the network, leading to an increment in performance. In this paper, we propose a simple strategy to implement forwarding tables for IBA switches that supports adaptive routing while still maintaining compatibility with the IBA specs. Adaptive routing can be individually enabled or disabled for each packet at the source node. The proposed strategy enables the use in IBA of any adaptive routing algorithm with an acyclic channel dependence graph. In this paper, we have taken advantage of the partial adaptivity provided by the well-known up*/down* routing algorithm. Evaluation results show that extending IBA switch capabilities with adaptive routing may noticeably increase network performance. In particular, network throughput improvement can be, on average, as high as 66%. © 2003 Elsevier B.V. All rights reserved.",
    	issn = 13837621,
    	journal = "Journal of Systems Architecture",
    	key = "Systems engineering",
    	keywords = "Algorithms;Communication;Information technology;Switches;Telecommunication networks;",
    	note = "Adaptive routing;",
    	number = "10-11",
    	pages = "441 - 456",
    	title = "{S}upporting adaptive routing in {IBA} switches",
    	url = "http://dx.doi.org/10.1016/S1383-7621(03)00103-6",
    	volume = 49,
    	year = 2003
    }
    
  35. J C Sancho, Juan Carlos Martinez, Antonio Robles, Pedro Lopez, Jose Flich and Jose Duato. Performance evaluation of COWS under real parallel applications. In Parallel and Distributed Processing Symposium, 2003. Proceedings. International. 2003, 10 pp.. DOI BibTeX

    @conference{1213371,
    	author = "J.C. Sancho and Martinez, Juan Carlos and Robles, Antonio and Lopez, Pedro and Flich, Jose and Duato, Jose",
    	abstract = "Clusters of workstations (COWS) are often arranged as a switch-based network with irregular topology. Usually, the evaluation of interconnection networks for COWS has been carried out by simulation using synthetic traffic and by traces from real parallel applications. Although both types of traffics are used as a first approximation of the behavior of the system, a more accurate behavior can be obtained by using real parallel applications. In this paper, a new simulation framework has been developed in order to evaluate interconnection networks under real parallel applications by using an execution-driven simulator. Moreover, the new simulator can be used to evaluate the impact on the performance of the whole system of several design parameters in addition to the interconnection network. Evaluation results show that the execution time of real parallel applications can be reduced by using an effective routing algorithm. Moreover, in some cases, the achieved improvements are higher than the ones achieved by improving other design issues, such as the processor instruction issue rate, the cache size or the network bandwidth.",
    	booktitle = "Parallel and Distributed Processing Symposium, 2003. Proceedings. International",
    	doi = "10.1109/IPDPS.2003.1213371",
    	issn = "1530-2075",
    	keywords = "COWS; cache size; clusters of workstations; execution-driven simulator; interconnection networks; network bandwidth; performance evaluation; processor instruction issue rate; simulation framework; switch-based network; discrete event simulation; performa",
    	month = "22-26",
    	pages = "10 pp.",
    	title = "{P}erformance evaluation of {COWS} under real parallel applications",
    	year = 2003
    }
    
  36. Maria E Gomez, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Evaluation of routing algorithms for InfiniBand networks. 2002, 775 - 80. BibTeX

    @conference{7568237,
    	author = "Gomez, Maria E. and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	abstract = "Storage area networks (SAN) provide the scalability required by the IT servers. The InfiniBand (IBA) interconnect is very likely to become the de facto standard for SAN as well as for NOW. The routing algorithm is a key design issue in irregular networks. Moreover, as several virtual lanes can be used and different network issues can be considered, the performance of the routing algorithms may be affected. In this paper we evaluate three existing routing algorithms (up*/down*, DFS, and smart-routing) suitable for being applied to IBA. Evaluation has been performed by simulation under different synthetic traffic patterns and I/O traces. Simulation results show that the smart-routing algorithm achieves the highest performance",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2002 Parallel Processing. 8th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2400)",
    	keywords = "parallel algorithms;performance evaluation;telecommunication network routing;telecommunication standards;telecommunication traffic;workstation clusters;",
    	note = "routing algorithms;InfiniBand networks;storage area networks;SAN;scalability;de facto standard;IBA interconnect;NOW;irregular networks;virtual lanes;performance;up*/down* routing;DFS routing;smart routing;synthetic traffic patterns;I/O traces;simulation;IT servers;",
    	pages = "775 - 80",
    	title = "{E}valuation of routing algorithms for {I}nfini{B}and networks",
    	year = 2002
    }
    
  37. JC Sancho, Antonio Robles and Jose Duato. Performance sensitivity of routing algorithms to failures in networks of workstations with regular and irregular topologies. In F Vajda and N Podhorszki (eds.). 10TH EUROMICRO WORKSHOP ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS. 2002, 81-90. BibTeX

    @conference{ISI:000173566600010,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) provide a cost-effective alternative to parallel computers. Components in NOWs may fail, degrading the network operation until the faults are repaired. In this paper, we analyze the influence of both switch and link failures on the network performance. In particular, given that network performance in NOWs strongly depends on the applied routing algorithm, we quantify the sensitivity to failures of two routing algorithms: flexible routing and up{*}/down{*} routing algorithms. In the case of up{*}/down{*} routing, two methodologies to compute routing tables are evaluated. Evaluation results modeling a Myrinet network show that, in general, up{*}/down{*} routing is more robust to failures, although its behavior strongly depends on the type of network topology, regular or irregular, and the methodology used to compute routing tables. However, the flexible routing algorithm presents a better performance, regardless of the network topology, even in presence of failures, but at expense of a larger sensitivity.",
    	booktitle = "10TH EUROMICRO WORKSHOP ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS",
    	editor = "Vajda, F and Podhorszki, N",
    	isbn = 0769514448,
    	note = "10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (PDP 2002), LAS PALMAS GC, SPAIN, JAN 09-11, 2002",
    	pages = "81-90",
    	title = "{P}erformance sensitivity of routing algorithms to failures in networks of workstations with regular and irregular topologies",
    	year = 2002
    }
    
  38. J C Sancho, Antonio Robles and Jose Duato. Performance sensitivity of routing algorithms to failures in networks of workstations with regular and irregular topologies. 2002, 81 - 90. URL BibTeX

    @conference{7205079,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) provide a cost-effective alternative to parallel computers. Components in NOWs may fail, degrading the network operation until the faults are repaired. In this paper, we analyze the influence of both switch and link failures on the network performance. In particular, given that network performance in NOWs strongly depends on the applied routing algorithm, we quantify the sensitivity to failures of two routing algorithms: flexible routing and up*/down* routing algorithms. In the case of up*/down* routing, two methodologies to compute routing tables are evaluated. Evaluation results modeling a Myrinet network show that, in general, up*/down* routing is more robust to failures, although its behavior strongly depends on the type of network topology, regular or irregular, and the methodology used to compute routing tables. However, the flexible routing algorithm presents a better performance, regardless of the network topology, even in presence of failures, but at expense of a larger sensitivity",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing",
    	keywords = "computer networks;performance evaluation;workstation clusters;",
    	note = "performance sensitivity;routing algorithms;networks of workstations;irregular topologies;regular topologies;link failures;switch failures;network performance;Myrinet network;",
    	pages = "81 - 90",
    	title = "{P}erformance sensitivity of routing algorithms to failures in networks of workstations with regular and irregular topologies",
    	url = "http://dx.doi.org/10.1109/EMPDP.2002.994237",
    	year = 2002
    }
    
  39. Jose Flich, Pedro Lopez, J C Sancho, Antonio Robles and Jose Duato. Improving InfiniBand routing through multiple virtual networks. 2002, 49 - 63. BibTeX

    @conference{7387421,
    	author = "Flich, Jose and Lopez, Pedro and J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "InfiniBand is very likely to become the de facto standard for communication between nodes and I/O devices as well as for interprocessor communication. Often, the interconnection pattern is irregular. Up*/down* is the most popular routing scheme currently used in NOWs with irregular topologies. However, the main drawbacks of up*/down* routing are the unbalanced channel utilization and the difficulties to route most packets through minimal paths, which negatively affects network performance. Using additional virtual lanes can improve up*/down* routing performance by reducing the head-of-line blocking effect, but its use is not aimed to remove its main drawbacks. We propose a methodology that uses a reduced number of virtual lanes in an efficient way to achieve a better traffic balance and a higher number of minimal paths. This methodology is based on routing packets simultaneously through several properly selected up*/down* trees. To guarantee deadlock freedom, each up*/down* tree is built over a different virtual network. Simulation results, show that the proposed methodology increases throughput up to an average factor ranging from 1.18 to 2.18 for 8, 16, and 32-switch networks by using only two virtual lanes. For larger networks with an additional virtual lane, network throughput is tripled, on average",
    	address = "Berlin, Germany",
    	journal = "High Performance Computing. 4th International Symposium, ISHPC 2002. Proceedings (Lecture Notes in Computer Science Vol.2327)",
    	keywords = "multiplexing;multiprocessor interconnection networks;telecommunication network routing;workstation clusters;",
    	note = "InfiniBand routing;networks of workstations;multiple virtual networks;interprocessor communication;NOWs;switch-based network;point-to-point links;up*/down* routing;head-of-line blocking effect;deadlock freedom;",
    	pages = "49 - 63",
    	title = "{I}mproving {I}nfini{B}and routing through multiple virtual networks",
    	year = 2002
    }
    
  40. J C Sancho, Antonio Robles, Jose Flich, Pedro Lopez and Jose Duato. Effective methodology for deadlock-free minimal routing in InfiniBand networks. In Parallel Processing, 2002. Proceedings. International Conference on. 2002, 409 - 418. DOI BibTeX

    @conference{1040897,
    	author = "J.C. Sancho and Robles, Antonio and Flich, Jose and Lopez, Pedro and Duato, Jose",
    	abstract = "The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology is arbitrarily established by the customer. We propose a simple and effective methodology for designing deadlock-free routing strategies that are able to route packets through minimal paths in InfiniBand networks. This methodology can meet the trade-off between network performance and the number of resources dedicated to deadlock avoidance. Evaluation results show that the resulting routing strategies significantly outperform up*/down* routing. In particular, throughput improvement ranges, on average, from 1.33 for small networks to 4.05 for large networks. Also, it is shown that just two virtual lanes and three service levels are enough to achieve more than 80% of the throughput improvement achieved by the best proposed routing strategy (the one that always provides minimal paths without limiting the number of resources).",
    	booktitle = "Parallel Processing, 2002. Proceedings. International Conference on",
    	doi = "10.1109/ICPP.2002.1040897",
    	issn = "0190-3918",
    	keywords = "InfiniBand architecture; InfiniBand networks; NOWs; deadlock-free minimal routing; interconnection pattern; minimal paths; network performance; packet routing; point-to-point links; service levels; switch-based network; throughput improvement; up*/down*",
    	pages = "409 - 418",
    	title = "{E}ffective methodology for deadlock-free minimal routing in {I}nfini{B}and networks",
    	year = 2002
    }
    
  41. J C Sancho, Jose Flich, Antonio Robles, Pedro Lopez and Jose Duato. Analyzing the influence of virtual lanes on the performance of infiniband networks. In Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM. 2002, 166 -175. BibTeX

    @conference{1016568,
    	author = "J.C. Sancho and Flich, Jose and Robles, Antonio and Lopez, Pedro and Duato, Jose",
    	booktitle = "Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM",
    	pages = "166 -175",
    	title = "{A}nalyzing the influence of virtual lanes on the performance of infiniband networks",
    	year = 2002
    }
    
  42. JC Sancho, Antonio Robles and Jose Duato. On the relative behavior of source and distributed routing in NOWs using up*/down* routing schemes. In K Klockner (ed.). NINTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS. 2001, 11-18. BibTeX

    @conference{ISI:000166833400002,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are arranged as a switch-based network with irregular topology which makes routing and deadlock avoidance quite complicated. Current proposals use the up{*}/down{*} routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently a simple and effective methodology to compute up{*}/down{*} routing tables has been proposed by us. The resulting up{*}/down{*} routing scheme increases the number of alternative paths between every pair of switches and allows most messages to follow minimal paths. Also, up{*}/down{*} routing is suitable to be implemented using source or distributed routing. Source routing provides a safer and lower cost implementation of up{*}/down{*} routing than that provided by distributed routing. However distributed routing may benefit from routing messages through alternative paths to reach their destination. In this paper we evaluate the performance of up{*}/down{*} routing when using two methodologies to compute routing tables, and when both source and distributed rousing are used. Evaluation results show that it is not worth to implement up{*}/down{*} routing in a distributed way in a NOW environment, since its performance is ver), close to that achieved by implementing it with source routing when a traffic-balancing algorithm is used. Moreover it is shown that a greater improvement in performance can be achieved by modifying the method to compute up{*}/down{*} routing tables when source routing is used.",
    	booktitle = "NINTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS",
    	editor = "Klockner, K",
    	isbn = 0769509886,
    	note = "9th Euromicro Workshop on Parallel and Distributed Processing, MANTOVA, ITALY, FEB 07-09, 2001",
    	pages = "11-18",
    	title = "{O}n the relative behavior of source and distributed routing in {NOW}s using up{*}/down{*} routing schemes",
    	year = 2001
    }
    
  43. JC Sancho, Antonio Robles and Jose Duato. Effective strategy to compute forwarding tables for InfiniBand networks. In LM Ni and M Valero (eds.). PROCEEDINGS OF THE 2001 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING. 2001, 48-57. BibTeX

    @conference{ISI:000171882100006,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "InfiniBand is very likely to become the facto standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InifiniBand Architecture (IBA) defines a switch-based network with point-to-point links that support any topology defined by the user Routing in IBA is distributed, based on forwarding tables, and only considers the packet destination ID for routing within subnets. Up{*}/down{*} routing is the simplest and most popular routing algorithm for irregular topologies. Unfortunately, up{*}/down{*} routing cannot be used in IBA switches because it may leads to deadlock. In this paper we address this issue, proposing an easy-to-implement strategy to compute up{*}/down{*} forwarding tables for IBA switches that guarantees deadlock freedom, and is effective whatever the methodology applied to compute up{*}/down{*} routing tables. Preliminary evaluation results modeling an InfiniBand network at register transfer level show that the proposed strategy, allows up{*}/down{*} routing algorithms to be implemented on InfiniBand networks with minimal performance degradation.",
    	booktitle = "PROCEEDINGS OF THE 2001 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING",
    	editor = "Ni, LM and Valero, M",
    	isbn = 0769512585,
    	issn = "0190-3918",
    	note = "30th International Conference on Parallel Processing (ICPP 01), VALENCIA, SPAIN, SEP 03-07, 2001",
    	pages = "48-57",
    	series = "PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING",
    	title = "{E}ffective strategy to compute forwarding tables for {I}nfini{B}and networks",
    	year = 2001
    }
    
  44. J C Sancho, Antonio Robles and Jose Duato. On the relative behavior of source and distributed routing in NOWs using Up*/Down* routing schemes. 2001, 11 - 18. URL BibTeX

    @conference{6867162,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up*/down* routing tables has been proposed by us. The resulting up*/down* routing scheme increases the number of alternative paths between every pair of switches and allows most messages to follow minimal paths. Also, up*/down* routing is suitable to be implemented using source or distributed routing. Source routing provides a safer and lower cost implementation of up*/down* routing than that provided by distributed routing. However distributed routing may benefit from routing messages through alternative paths to reach their destination. In this paper we evaluate the performance of up*/down* routing when using two methodologies to compute routing tables, and when both source and distributed routing are used. Evaluation results show that it is not worth to implement up*/down* routing in a distributed way in a NOW environment, since its performance is very close to that achieved by implementing it with source routing when a traffic-balancing algorithm is used. Moreover it is shown that a greater improvement in performance can be achieved by modifying the method to compute up*/down* routing tables when source routing is used",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing",
    	keywords = "network routing;performance evaluation;system recovery;workstation clusters;",
    	note = "distributed routing;NOWs;Up*/Down* routing schemes;networks of workstations;switch-based network;irregular topology;deadlock avoidance;cyclic dependencies;lower cost implementation;performance;",
    	pages = "11 - 18",
    	title = "{O}n the relative behavior of source and distributed routing in {NOW}s using {U}p*/{D}own* routing schemes",
    	url = "http://dx.doi.org/10.1109/EMPDP.2001.904962",
    	year = 2001
    }
    
  45. J C Sancho, Antonio Robles and Jose Duato. Effective strategy to compute forwarding tables for infiniBand networks. 2001, 48 - 57. BibTeX

    @conference{7081877,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "InfiniBand is very likely to become the facto standard for communication between processing nodes and I/O devices as well as for interprocessor communication. The InifiniBand Architecture (IBA) defines a switch-based network with point-to-point links that support any topology defined by the user. Routing in IBA is distributed based on forwarding tables, and only considers the packet destination ID for routing within subnets. Up*/down* routing is the simplest and most popular routing algorithm for irregular topologies. Unfortunately, up*/down* routing cannot be used in IBA switches because it may leads to deadlock. In this paper we address this issue, proposing an easy-to-implement strategy to complete up*/down* forwarding tables for IBA switches that guarantees deadlock freedom, and is effective whatever the methodology applied to compute up*/down* routing tables. Preliminary evaluation results modeling an InfiniBand network at register transfer level show that the proposed strategy allows up*/down* routing algorithms to be implemented on InfiniBand networks with minimal performance degradation",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings International Conference on Parallel Processing",
    	keywords = "concurrency control;multiprocessor interconnection networks;network routing;performance evaluation;",
    	note = "forwarding tables;infiniBand networks;processing nodes;I/O devices;interprocessor communication;switch-based network;point-to-point links;packet destination;easy-to-implement strategy;register transfer level;minimal performance degradation;",
    	pages = "48 - 57",
    	title = "{E}ffective strategy to compute forwarding tables for infini{B}and networks",
    	year = 2001
    }
    
  46. Jose Duato, Antonio Robles, Federico Silla and R Beivide. A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment. Journal of Parallel and Distributed Computing 61(2):224 - 253, 2001. URL BibTeX

    @article{2004488488316,
    	author = "Duato, Jose and Robles, Antonio and Silla, Federico and R. Beivide",
    	abstract = "Most multicomputer interconnection networks use wormhole switching, leading to fast and compact routers. Current routers incorporate virtual channels and even fully adaptive routing. Networks of workstations (NOWs) inherited multicomputer technology. Most commercial routers designed for NOWs implement wormhole switching. However, wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. However, buffer requirements and packetizing overhead prevented its widespread use in multicomputers. Nevertheless, wormhole and VCT switching require similar buffer capacity in NOWs. Moreover, some messaging layers such as Illinois Fast Messages (FM) and BIP split messages into packets for increased performance. Therefore, the traditional disadvantages of VCT switching disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole routers, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in a NOW environment. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost. Also, VCT routers require buffer capacity independent of wire length, making them suitable for networks of workstations. © 2001 Academic Press.",
    	address = "Orlando, United States",
    	issn = 07437315,
    	journal = "Journal of Parallel and Distributed Computing",
    	number = 2,
    	pages = "224 - 253",
    	title = "{A} {C}omparison of {R}outer {A}rchitectures for {V}irtual {C}ut-{T}hrough and {W}ormhole {S}witching in a {NOW} {E}nvironment",
    	url = "http://dx.doi.org/10.1006/jpdc.2000.1679",
    	volume = 61,
    	year = 2001
    }
    
  47. JC Sancho and Antonio Robles. Improving the up*/down* routing scheme for networks of workstations. In A Bode, T Ludwig, W Karl and R Wismuller (eds.). EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS 1900. 2000, 882-889. BibTeX

    @conference{ISI:000189042500123,
    	author = "JC Sancho and Robles, Antonio",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Many NOWs are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up{*}/down{*} routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up{*}/down{*} routing tables has been proposed by us. The resulting up{*}/down{*} routing scheme makes use of a different link direction assignment to compute routing tables. Assignment of link direction is based on generating an underlying acyclic connected graph from the network graph. In this paper, we propose and evaluate new heuristic rules to compute the underlying graph. Moreover, we propose a traffic balancing algorithm to obtain more efficient up{*}/down{*} routing tables when source routing is used. Evaluation results show that the routing algorithm based on the new methodology increases throughput by a factor of up to 2.8 in large networks, also reducing latency significantly.",
    	booktitle = "EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS",
    	editor = "Bode, A and Ludwig, T and Karl, W and Wismuller, R",
    	isbn = 3540679561,
    	issn = "0302-9743",
    	note = "6th International Euro-Par 2000 Conference, MUNICH, GERMANY, AUG 29-SEP 01, 2000",
    	pages = "882-889",
    	series = "LECTURE NOTES IN COMPUTER SCIENCE",
    	title = "{I}mproving the up{*}/down{*} routing scheme for networks of workstations",
    	volume = 1900,
    	year = 2000
    }
    
  48. J C Sancho and Antonio Robles. Improving the up*/down* routing scheme for networks of workstations. 2000, 882 - 9. BibTeX

    @conference{6905305,
    	author = "J.C. Sancho and Robles, Antonio",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. A simple and effective methodology to compute up*/down* routing tables has been proposed by Sancho et al. (2000). The resulting up*/down* routing scheme makes use of a different link direction assignment to compute routing tables. Assignment of link direction is based on generating an underlying acyclic connected graph from the network graph. In this paper, we propose and evaluate new heuristic rules to compute the underlying graph. Moreover, we propose a frame balancing algorithm to obtain more efficient up*/down* routing tables when source routing is used. Evaluation results show that the routing algorithm based on the new methodology increases throughput by a factor of up to 2.8 in large networks, also reducing latency significantly",
    	address = "Berlin, Germany",
    	journal = "Euro-Par 2000 Parallel Processing. 6th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.1900)",
    	keywords = "network routing;network topology;optimisation;parallel machines;workstation clusters;",
    	note = "up down routing;workstation networks;parallel computers;link direction assignment;acyclic connected graph;heuristic rules;frame balancing;irregular topology;deadlock avoidance;spanning trees;",
    	pages = "882 - 9",
    	title = "{I}mproving the up*/down* routing scheme for networks of workstations",
    	year = 2000
    }
    
  49. JC Sancho, Antonio Robles and Jose Duato. Improving minimal adaptive routing in networks with irregular topology. In G Chaudhry and E Sha (eds.). PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS. 2000, 314-319. BibTeX

    @conference{ISI:000179773600050,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks, of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Many NOWs are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Several current proposals, like up{*}/down{*} routing, avoid deadlock by removing cyclic dependencies between channels. A more efficient approach consists of allowing cyclic dependencies between channels while providing some escape paths to avoid deadlock. Minimal adaptive routing (MA) is a distributed adaptive routing algorithm that is able to use all the minimal paths and guarantees deadlock freedom by using up{*}/down{*} routing to route messages through the escape paths. Recently, a simple and effective methodology to compute up{*}/down{*} routing tables has been proposed by us. The resulting up{*}/down{*} routing scheme makes use of a different link direction assignment to compute routing tables. Assignment of link direction is based on generating an underlying acyclic connected graph from the network graph. In this paper, we analyze the influence of using the new methodology to compute up{*}/down{*} routing tables on the performance of the minimal adaptive routing algorithm. Evaluation results show that when the methodology to compute up{*}/down{*} routing tables is combined with minimal adaptive routing, an improvement in throughput of up to 40\% is achieved, also reducing latency.",
    	booktitle = "PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS",
    	editor = "Chaudhry, G and Sha, E",
    	isbn = "188084334X",
    	note = "13th International Conference on Parallel and Distributed Computing Systems, LAS VEGAS, NV, AUG 08-10, 2000",
    	pages = "314-319",
    	title = "{I}mproving minimal adaptive routing in networks with irregular topology",
    	year = 2000
    }
    
  50. JC Sancho, Antonio Robles and Jose Duato. A new methodology to compute deadlock-free routing tables for irregular networks. In B Falsafi and M Lauria (eds.). NETWORK-BASED PARALLEL COMPUTING, PROCEEDINGS - COMMUNICATION, ARCHITECTURE, AND APPLICATIONS 1797. 2000, 45-60. BibTeX

    @conference{ISI:000171691200004,
    	author = "JC Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Many NOWs are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up{*}/down{*} routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow non-minimal paths, increasing latency and wasting resources. In this paper, we propose a new methodology to compute deadlock-free routing tables for NOWs. The methodology tries to minimize the limitations of the current proposals in order to improve network performance. It is based on generating an underlying acyclic connected graph from the network graph and assigning a sequence number to each switch, which is used to remove cyclic dependencies. Evaluation results show that the routing algorithm based on the new methodology increases throughput by a factor of up to 2 in large networks, also reducing latency significantly.",
    	booktitle = "NETWORK-BASED PARALLEL COMPUTING, PROCEEDINGS - COMMUNICATION, ARCHITECTURE, AND APPLICATIONS",
    	editor = "Falsafi, B and Lauria, M",
    	isbn = 3540678794,
    	issn = "0302-9743",
    	note = "4th International Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing (CANPC 2000), TOULOUSE, FRANCE, JAN 08, 2000",
    	pages = "45-60",
    	series = "LECTURE NOTES IN COMPUTER SCIENCE",
    	title = "{A} new methodology to compute deadlock-free routing tables for irregular networks",
    	volume = 1797,
    	year = 2000
    }
    
  51. J C Sancho, Antonio Robles and Jose Duato. A new methodology to compute deadlock-free routing tables for irregular networks. 2000, 45 - 60. BibTeX

    @conference{6826449,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations (NOWs) are being considered as a cost-effective alternative to parallel computers. Many NOWs are arranged as a switch-based network with irregular topology, which makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. However, routing is considerably restricted and most messages must follow non-minimal paths, increasing latency and wasting resources. In this paper, we propose a new methodology to compute deadlock-free routing tables for NOWs. The methodology tries to minimize the limitations of the current proposals in order to improve network performance. It is based on generating an underlying acyclic connected graph from the network graph and assigning a sequence number to each switch, which is used to remove cyclic dependencies. Evaluation results show that the routing algorithm based on the new methodology increases throughput by a factor of up to 2 in large networks, also reducing latency significantly",
    	address = "Berlin, Germany",
    	journal = "Network-Based Parallel Computing. Communication, Architecture, and Applications. 4th International Workshop, CANPC 2000. Proceedings (Lecture Notes in Computer Science Vol.1797)",
    	keywords = "performance evaluation;workstation clusters;",
    	note = "deadlock-free routing tables;irregular networks;networks of workstations;switch-based network;irregular topology;routing;deadlock avoidance;cyclic dependencies;latency;network performance;acyclic connected graph;network graph;",
    	pages = "45 - 60",
    	title = "{A} new methodology to compute deadlock-free routing tables for irregular networks",
    	year = 2000
    }
    
  52. J C Sancho, Antonio Robles and Jose Duato. A flexible routing scheme for networks of workstations. 2000, 260 - 7. BibTeX

    @conference{6977552,
    	author = "J.C. Sancho and Robles, Antonio and Duato, Jose",
    	abstract = "NOW are arranged as a switch-based network which allows the layout of both regular and irregular topologies. However, the irregular pattern interconnect makes routing and deadlock avoidance quite complicated. Current proposals use the up*/down* routing algorithm to remove cyclic dependencies between channels and avoid deadlock. Recently, a simple and effective methodology to compute up*/down* routing tables has been proposed by us. The resulting routing algorithm is very effective in irregular topologies. However, its behavior is very poor in regular networks with orthogonal dimensions. Therefore, we propose a more flexible routing scheme that is effective in both regular and irregular topologies. Unlike up*/down* routing algorithms, the proposed routing algorithm breaks cycles at different nodes for each direction in the cycle, thus providing better traffic balancing than that provided by up*/down* routing algorithms. Evaluation results modeling a Myrinet network show that the new routing algorithm increases throughput with respect to the original up*/down* routing algorithm by a factor of up to 3.5 for regular networks, also maintaining the performance of the improved up*/down* routing scheme proposed in Sancho et al., (2000), when applied to irregular networks",
    	address = "Berlin, Germany",
    	journal = "High Performance Computing. Third International Symposium, ISHPC 2000. Proceedings (Lecture Notes in Computer Science Vol.1940)",
    	keywords = "concurrency control;network routing;network topology;performance evaluation;workstation clusters;",
    	note = "networks of workstations;routing scheme;NOW;regular topologies;irregular topologies;deadlock avoidance;traffic balancing;Myrinet network;performance;",
    	pages = "260 - 7",
    	title = "{A} flexible routing scheme for networks of workstations",
    	year = 2000
    }
    
  53. Jose Duato, Antonio Robles, Federico Silla and R Beivide. Comparison of router architectures for virtual cut-through and wormhole switching in a NOW environment. Proceedings of the International Parallel Processing Symposium, IPPS, pages 240 - 247, 1999. BibTeX

    @article{1999394752205,
    	author = "Duato, Jose and Robles, Antonio and Silla, Federico and R. Beivide",
    	abstract = "Most commercial routers designed for networks of workstations (NOWs) implement wormhole switching. However, wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. Moreover, the traditional disadvantages of VCT switching, as buffer requirements and packetizing overhead, disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole ones, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in NOWs. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost.",
    	address = "San Juan",
    	issn = 10637133,
    	journal = "Proceedings of the International Parallel Processing Symposium, IPPS",
    	key = "Pipeline processing systems",
    	keywords = "Adaptive algorithms;Computer architecture;Computer workstations;Switching networks;",
    	note = "Virtual cut-through (VCT);Wormhole switching;",
    	pages = "240 - 247",
    	title = "{C}omparison of router architectures for virtual cut-through and wormhole switching in a {NOW} environment",
    	year = 1999
    }
    
  54. Jose Duato, Antonio Robles, Federico Silla and R Beivide. A comparison of router architectures for virtual cut-through and wormhole switching in a NOW environment. 1999, 240 - 7. URL BibTeX

    @conference{6245442,
    	author = "Duato, Jose and Robles, Antonio and Silla, Federico and R. Beivide",
    	abstract = "Most commercial routers designed for networks of workstations (NOWs) implement wormhole switching. However wormhole switching is not well suited for NOWs. The long wires required in this environment lead to large buffers to prevent buffer overflow during flow control signaling. Moreover, wire length is limited by buffer size. Virtual cut-through (VCT) achieves a higher throughput than wormhole switching. Moreover, the traditional disadvantages of VCT switching, as buffer requirements and packetizing overhead, disappear in NOWs. In this paper, we show that VCT routers can be simpler than wormhole ones, while still achieving the advantages of using virtual channels and adaptive routing. We also propose a fully adaptive routing algorithm for VCT switching in NOWs. Moreover, we show that VCT routers outperform wormhole routers in a NOW environment at a lower cost",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999",
    	keywords = "multiprocessor interconnection networks;network routing;workstation clusters;",
    	note = "router architectures;virtual cut-through;wormhole switching;NOW environment;networks of workstations;buffer requirements;packetizing overhead;VCT routers;",
    	pages = "240 - 7",
    	title = "{A} comparison of router architectures for virtual cut-through and wormhole switching in a {NOW} environment",
    	url = "http://dx.doi.org/10.1109/IPPS.1999.760469",
    	year = 1999
    }
    
  55. Federico Silla, Antonio Robles and Jose Duato. Improving performance of networks of workstations by using Disha Concurrent. 1998, 80 - 7. URL BibTeX

    @conference{6034697,
    	author = "Silla, Federico and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations are currently emerging as a cost-effective alternative to parallel computers. Recently, deadlock recovery techniques have been shown to be an alternative to deadlock avoidance. Disha Concurrent is a progressive deadlock recovery scheme able to simultaneously redirect several deadlocked messages through a deadlock-free lane. Unlike deadlock avoidance techniques, Disha provides true fully adaptive routing without using virtual channels to guarantee deadlock freedom. In this paper, we analyze the application of Disha to networks of workstations. We propose an implementation of Disha on irregular networks that allows concurrent deadlock recovery proving that this implementation is always able to recover from deadlock. A new switch organization and a new flow control protocol are proposed to support Disha. Performance evaluation results show that applying Disha to irregular networks increases network throughput by a factor of up to 3.5, and also reduces latency with regard to other routing algorithms based on deadlock avoidance techniques",
    	address = "Los Alamitos, CA, USA",
    	journal = "Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205)",
    	keywords = "concurrency control;local area networks;parallel processing;performance evaluation;system recovery;workstations;",
    	note = "performance improvement;networks of workstations;Disha Concurrent;deadlock recovery techniques;deadlock avoidance;flow control protocol;latency;",
    	pages = "80 - 7",
    	title = "{I}mproving performance of networks of workstations by using {D}isha {C}oncurrent",
    	url = "http://dx.doi.org/10.1109/ICPP.1998.708466",
    	year = 1998
    }
    
  56. Federico Silla, Antonio Robles and Jose Duato. Improving performance of networks of workstations by using Disha Concurrent. In TH Lai (ed.). 1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS. 1998, 80-87. BibTeX

    @conference{ISI:000075698400010,
    	author = "Silla, Federico and Robles, Antonio and Duato, Jose",
    	abstract = "Networks of workstations are currently emerging as a cost-effective alternative to parallel computers. Recently, deadlock recovery techniques have been shown to be an alternative to deadlock avoidance. Disha Concurrent is a progressive deadlock recovery scheme able to simultaneously redirect several deadlocked messages through a deadlock-free lane. Unlike deadlock avoidance techniques, Disha provides true fully adaptive routing without using virtual channels to guarantee deadlock freedom. In this paper, we analyze the application of Disha to networks of workstations. We propose an implementation of Disha on irregular networks that allows concurrent deadlock recovery, proving that this implementation is always able to recover from deadlock. A new switch organization and a new flow control protocol are proposed to support Disha. Performance evaluation results shaw that applying Disha to irregular networks increases network throughput by a factor of up to 3.5, and also reduces latency with regard to other routing algorithms based on deadlock avoidance techniques.",
    	booktitle = "1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS",
    	editor = "Lai, TH",
    	isbn = 0818686510,
    	issn = "0190-3918",
    	note = "International Conference on Parallel Processing (ICPP), MINNEAPOLIS, MN, AUG 10-14, 1998",
    	pages = "80-87",
    	series = "PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING",
    	title = "{I}mproving performance of networks of workstations by using {D}isha {C}oncurrent",
    	year = 1998
    }
    
  57. Antonio Robles and Jose Duato. Multilinks: a new approach to the design of adaptive routing algorithms for multicomputers. 1992, 405 - 10. BibTeX

    @conference{4214095,
    	author = "Robles, Antonio and Duato, Jose",
    	abstract = "A new methodology for the design of deadlock-free adaptive routing algorithms is proposed, which is based on the use of multilinks. This is a new concept consisting of a virtual link formed by several adjacent physical channels simultaneously reserved by the router. Through simulation, the paper investigates the performance of two adaptive strategies for wormhole routing based on multilinks, comparing them with static routing. All adaptive routing strategies outperformed static routing significantly. Different network sizes have been evaluated, showing that the relative improvement of adaptive routing with regard to static routing increases with the network size",
    	address = "Amsterdam, Netherlands",
    	journal = "Parallel and Distributed Computing in Engineering Systems. Proceedings of the IMACS/IFAC International Symposium",
    	keywords = "multiprocessor interconnection networks;switching theory;",
    	note = "adaptive routing algorithms;multicomputers;deadlock-free adaptive routing algorithms;multilinks;virtual link;physical channels;wormhole routing;network size;",
    	pages = "405 - 10",
    	title = "{M}ultilinks: a new approach to the design of adaptive routing algorithms for multicomputers",
    	year = 1992
    }
    

Thesis

Design of Efficient TLB-based Data Classification Mechanisms in Chip Multiprocessors. Alberto Ros, Antonio Robles, Maria E. Gómez (Processor Architecture)

Efficient techniques to provide scalability for token-based cache coherence protocols. Antonio Robles, Jose Duato (Routing Algorithms)

Efficient mechanisms to provide fault tolerance in interconnection networks for PC Clusters. Jose Flich, Antonio Robles Antonio (Fault Tolerance)