Processing data streams faster
Data stream processing is becoming ever more demanding and important. Nesime Tatbul, Professor of Advanced Data Management Systems at the Department of Computer Science of ETH Zurich, received the IBM Faculty Award worth of 40,000 US dollars for her work in the field of data stream management. As part of a joint project with IBM, she will study how new hardware platforms can contribute to the faster processing of data streams.
Most of the time, we care little about how data streams are processed. What matters is that the real-time display at the bus stop tells us correctly when the next bus will arrive. However, processing real-time traffic information is only one of the many uses of data stream management systems in our daily life. Other areas of application include: electronic securities trading, supervising public spaces, the quality control and regulation of manufacturing plants, or monitoring the water quality of rivers. It also covers scientific applications such as the data stream analysis of the Large Hadron Collider (LHC) at CERN.
This constant further development of sensor networks is accompanied by a growing requirement for the processing of large amounts of data. Nesime Tatbul explains, “A paradigm shift has occurred in data management in recent years. Previously, data were first of all stored in a database and then retrieved again on demand. Nowadays, data streams are often processed in real time.” Data stream management must therefore fulfil requirements different from those of traditional data management.
Hardware and software: successful together
The challenge in data stream management lies in the rapid processing of large amounts of data. For example, one of Tatbul’s current projects with Credit Suisse is concerned with finding efficient methods for evaluating information from options exchanges. This involves data volumes in the order of hundreds of thousands of messages per second. Reducing the processing delay by even just a couple of milliseconds will have a significant effect on profit.
However, faster processing is achievable only through better interaction between hardware and software. So-called Field Programmable Gate Arrays (FPGA) play an important role in this. Unlike the chips used in computers, these hardware components can be reprogrammed over and over again. As a result, the hardware can undertake what are actually software functions. FPGAs also enable the parallel processing of programs and thus faster data processing. In addition to FPGAs, there exist other hardware platforms which come into consideration for such applications. It is clear to Tatbul that the future lies in developing hardware and software together rather than superimposing software onto existing hardware as in the past. Tatbul stresses that, “This requires us software programmers to re-think and re-learn.”
Fruitful collaboration with IBM
Tatbul will use the award money from IBM to work on these tasks as part of a joint project. They will test the possibilities of various hardware components in order to accelerate the processing of data streams. The exchange with IBM in this project is decisive for Tatbul: she stresses that, “The future of high-performance data stream management systems lies in the successful combination of hardware and software. Through the collaboration with IBM, my team and I gain the opportunity to combine our knowledge of software with their expertise in the hardware area. Both sides benefit from this.” Tatbul emphasises that the exchange with IBM is very intensive. She says that, just last summer, one of her master students undertook a research visit to IBM Böblingen where he worked on an FPGA project. “He is now bringing the expertise he gained there to the work we are doing here.” Researchers from industry and science also have regular exchanges in joint workshops.
Intensive cooperation with the industry – a successful concept
The collaboration with IBM is only one of several industry cooperation projects in which Tatbul is involved, for input from industry is indispensable in the field of data management. This is why Tatbul and her colleagues Donald Kossmann, Gustavo Alonso and Timothy Roscoe, who together with her make up the Systems Group, founded the Enterprise Computing Center (ECC).
Amadeus, SAP, and Credit Suisse are currently involved as industry partners in the ECC. Close, long-term cooperation projects of this kind between industry partners and ETH Zurich are designed to help optimise the highly complex processing of large volumes of data. Data management appeals to Tatbul precisely because it can be applied in so many different areas. Tatbul stresses that, “The connection to practical applications is very important to me. The opportunity to influence such a wide variety of areas through my work was one of the main reasons why I studied Computer Science.”
READER COMMENTS