Wei Zhang


     DSL 481A
     Department of Computer Science
     Florida State University
     Tallahassee, FL 32306
     Email: wzhang@cs.fsu.edu

 

About Me

I am a Ph.D student at computer sicence department, Florida State University, working with Dr. van Engelen at ACIS laboratory. I joined this program in fall 2004 after I obtained master's degree at department of computer science, Baylor University, Waco, TX.

I have been doing research on Table-Driven XML Processing methodology, called TDX, with Dr. van Engelen since spring 2005. TDX provides an integrated approach to combine XML well-formedness parsing, type-checking and validation, and application data handling. The approach significantly improved XML processing performance as we demonstrated in our papers. TDX expedites XML by pre-recording the states of an XML parser in tabular forms at compile time and by utilizing an efficient parsing engine based on a pushdown automaton at runtime. The tabular forms are automatically constructed from XML schemas. TDX increases the performance of XML processing, offers a high-level of modularity (hot-swappable tables) and adaptiveness for developing XML-based applications, and provides a mechanism to encode application-specific events into XML parser.


Research

My research is mainly centered on High-Performance Computing, XML Web Services, Grid Computing, and Cloud Computing. My research on Grid and Web services aims to simplify Grid and Web access, and improve the performance through compiler technologies.

I have developed a code generator toolkit that automatically generates codes for Table-Driven XML Parser (TDX )in C/C++. TDX provides an integrated approach to combine XML well-formedness parsing, type-checking and validation, and application data handling. The approach significantly improved XML processing performance as we demonstrated in our papers. TDX expedites XML by pre-recording the states of an XML parser in tabular forms at compile time and by utilizing an efficient parsing engine based on a pushdown automaton at runtime. The tabular forms are automatically constructed from XML schemas. TDX increases the performance of XML processing, offers a high-level of modularity (hot-swappable tables) and adaptiveness for developing XML-based applications, and provides a mechanism to encode application-specific events into XML parser.

Another research interst I am currently investigating is the encoding of XPath expressions into TDX parsing tables. This ensures that a high-performance XML filtering system can be implemented to inspect and extract relevant XML data such as WS-Security tokens for authentication. An XML filtering system consists of an XPath Expression Processor (XPP) and an XPath Engine. The XXP processes XPath expressions and marks TDX parsing table accordingly. The XPath engine is similar to the TDX engine. It performs parsing, validation, and delivers XML data that are matched to query specifications representing data interests of users or the application.


Curriculum Vitae

[pdf]

Contact: 
Department of Computer Science
Tallahassee, FL 32306
Cellular Phone: (850)980-2614
Email: wzhang@cs.fsu.edu
Research Interest:
 Education:
Awards:
Internship:
 Employment History:

Research Experience:

Teaching Experience:

Publications:

  1. W. Zhang and R. van Engelen. High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 286-294, Beijing, China, September 23-26, 2008.
    • Abstract: XML has become the de facto standard for exchanging structured information on the Web and for delivering rich content to users and business information processing systems. The extensibility, flexibility, expressiveness, and platform-neutrality of XML delivers key advantages for interoperability. Thus, industrial-strength interoperable XML-based Web services standards are widely adapted by organizations to deploy service-oriented architectures that comprise tens to thousands of loosely-coupled services. However, the interoperability of XML Web services comes at the price of reduced efficiency of message composition, transfer, and parsing compared to simple binary protocols. Parsing and validation of XML against a schema is expensive. This paper presents a high-performance XML parsing and validation technique that is time and space optimal. A schema-specific parsing method is developed that uses a two-stack push-down automaton for single-pass parsing and validation without backtracking. The schemas and validity constraints are packed in a compact parsing table derived from a permutation phrase grammar. This approach reduces both the space and time requirements of XML parsing and validation. Other schema-specific validating XML parsing methods trade efficiency for space (larger code and/or data size) or trade space for efficiency (backtracking). Performance results show that the method is significantly faster than traditional validating and non-validating XML parsers.
  2. R. van Engelen and W.Zhang. An Overview and Evaluation of Web Services Security Performance Optimizations. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 137-144, Beijing, China, September 23-26, 2008.
    • Abstract: WS-Security is an essential component of the Web services protocol stack. The WS-Security provides end-to-end security properties (integrity, confidentiality, and authentication) through XML Encryption and XML Signature open W3C standards. End-to-end security properties assures the participation of non-secure transport intermediaries in message exchanges, a key advantage in Web systems. However, compared to point-to-point messaging with TLS, WS-Security has a significant performance penalty. In this paper, we survey several techniques for WS-Security signature performance optimization for message integrity and compare experimental results to determine the overall combined performance impact. We also compare the performance to TLS for point-to-point message integrity and confidentiality.
  3. R. van Engelen, and W. Zhang. Identifying Opportunities for Web Services Security Performance Optimizations. In Proceedings of IEEE Congress on Services - Part I, Hawaii, July 6-11, 2008.
    • Abstract: WS-Security is an essential component of the Web services protocol stack. WS-Security provides end-to-end security properties, thereby assuring the participation of non-secure transport intermediaries in message exchanges, a key advantage in Web-based systems. However, compared to point-to-point secure messaging with TLS, WS-Security has a significant performance penalty. In this paper, we identify several opportunities for optimizing WS-Security.
  4. M. R. Head, M. Govindaraju, R. van Engelen, and W. Zhang. Benchmarking XML processors for applications in grid web services. In Proceedings of SC’06 (Supercomputing): International Conference for High Performance Computing, Networking, and Storage, Tampa, FL, USA, November 11-17, 2006.
    • Abstract: Numerous XML processing tools exist today, each of which is optimized for specific features. To make the right decisions, grid application and middle-ware developers must thus understand the complex dependencies between XML features and the application. We propose a standard benchmark suite for quantifying, comparing, and contrasting the performance of XML processors under a wide range of representative use cases. The benchmarks are defined by a set of XML schemas and conforming documents. To demonstrate the utility of the benchmarks and to provide a snapshot of the current XML implementation andscape, we report the performance of many different XML implementations, on the benchmarks, and draw conclusions about their current performance characteristics. We also present a brief analysis on the current shortcomings and required critical design changes for multi-threaded XML processing tools to run efficiently on emerging multi-core architectures.
  5. W. Zhang and R. van Engelen. A table-driven streaming XML parsing methodology for high-performance web services. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 197-204, Chicago, IL, USA, September 18-22, 2006, (Best Student Paper Award, acceptance rate: 18%, invited for journal extension by International Journal of Web Services Research).
    • Abstract: This paper presents a table-driven streaming XML parsing methodology, called TDX. TDX expedites XML parsing by pre-recording the states of an XML parser in tabular form and by utilizing an efficient runtime streaming parsing engine based on a push-down automaton. The parsing tables are automatically produced from the XML schemas or a WSDL service description. Because the schema constraints are pre-encoded in a parsing table, the approach effectively implements a schema-specific XML parsing technique that combines parsing and validation into a single pass. This significantly increases the performance of XML Web services, which results in better response time and may reduce the impact of the flash-crowd effect. To implement TDX, we developed a parser construction toolkit to automatically construct parsers in C code from WSDLs and XML schemas. We applied the toolkit to an example Web services application and measured the raw performance compared to popular high-performance parsers written in C/C++, such as eXpat, gSOAP, and Xerces. The performance results show that TDX can be an order of magnitude faster.
  6. R. van Engelen, M. Govindaraju, and W. Zhang. Exploring remote object coherence in XML web services. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 249-256, Chicago, IL, USA, September 18-22, 2006.
    • Abstract: Object coherence in platform-specific and tightly-coupled systems is achieved with binary serialization protocols to ensure data structures and object graphs are safely transmitted, manipulated, and stored. On the opposite side of the spectrum are platform-neutral Web services that embrace XML as a serialization protocol for building loosely coupled systems. The advantages of XML to connect heterogeneous systems are plenty, but rendering programming-language specific data structures and object graphs in text form incurs a performance hit and presents challenges for systems that require object coherence. Achieving the latter goal poses difficulties by a phenomenon that is sometimes referred to as the "impedance mismatch" between programming language data types and XML schema types. This paper examines the problem, debunks the O/X-mismatch controversy, and presents a mix of static/dynamic algorithms for accurate XML serialization.
  7. D. A. Gaitros, W. Zhang, A. Mast, G. Riccardi, and F. Ronquist. A Biodiversity semantic associative annotation tool. Proceedings of International Conference on Internet Computing (ICOMP'06), pages 29-35, Las Vegas, Nevada, USA, June 26-29, 2006.
    • Abstract: This paper presents a methodology for the creation of an annotation tool for on-line collaboration and sharing of heterogeneous data through a common medium. The annotation tool combines the advantages of highly organized relational database, extensible XML schemas, Life Science Identifiers, and accepted industry ontology using DataGrid technologies to facilitate collection and sharing information on biological specimens. Schematized annotation provides biologists with a flexible framework to perform annotations using their own data models. Structured XML documents enable structure-based semantic retrieval to improve the query accuracy. Retrieval performance can also be improved by combining the relational database and XML documents.
  8. W. Zhang and R. van Engelen. TDX: a high-performance table-driven XML parser. In Proceedings of the ACM Southeast conference, pages 726-731, Melbourne, FL, USA, March 10-12, 2006.
    • Abstract: This paper presents TDX, a table-driven XML parser. TDX combines parsing and validation into one pass to increase the performance of XMLbased applications, such as Web services. The TDX approach is based on the observation that context-free grammars can be automatically derived from XML schema. We developed a parser construction tool to automatically construct TDX grammar productions from a schema. Grammar tokens are defined by the specific schema element names, attribute names, and text. Because most of the structural constraints in XML schema are cast as grammar rules, parsing and validation of XML instances are efficiently implemented.
  9. R. van Engelen, W. Zhang, and M. Govindaraju. Toward remote object coherence with compiled object serialization for distributed computing with XML web services. In Proceedings of Compilers for Parallel Computing Workshop, pages 441-455, 2006.
    • Abstract: Cross-platform object-level coherence in Web services-based distributed systems and grids requires lossless serialization to ensure programminglanguage specific objects are safely transmitted, manipulated, and stored. However, Web services development tools often suffer from lossy forms of XML serialization, which diminishes the usefulness of XML Web services as a competitive approach to binary protocols. The difficulty mainly originates from the impedance mismatch between programming language data types and XML schema types. To overcome this obstacle, we propose hybrid static/dynamic algorithms to support lossless serialization of programming-language specific binaryencoded object graphs to text-based XML trees, while staying within the limits imposed by XML schema validation and the XSD type system. This paper presents a compiler-based approach to automatically emit serialization routines for C and C++ data types to XML.

Technical Skills:

Presentations:

  1. High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers.In Proceedings of IEEE International Conference on Web Services (ICWS), pages 286-294, Beijing, China, September 23-26, 2008.
  2. An Overview and Evaluation of Web Services Security Performance Optimizations.In Proceedings of IEEE International Conference on Web Services (ICWS), pages 137-144, Beijing, China, September 23-26, 2008.
  3. A table-driven streaming XML parsing methodology for high-performance web services, IEEE International Conference on Web Services (ICWS), IL, USA, September 18-22, 2006.
  4. Exploring remote object coherence in XML web services, IEEE International Conference on Web Services (ICWS), IL, USA, September 18-22, 2006.
  5. TDX: a high-performance table-driven XML parser, ACM South East Conference (ACMSE), Melbourne, FL, March 10-12, 2006. CURRENT

Research Projects:


Publications

  1. Wei Zhang and Robert van Engelen. An Adaptive XML Parser for Developing High-Performance Web Services.In Proceedings of IEEE e-Science Conference workshop on Advances in High-Performance E-Science Middleware and Applications (AHEMA), 2008.
  2. Wei Zhang and Robert van Engelen. High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers.In Proceedings of IEEE International Conference on Web Services (ICWS), pages 286-294, Beijing, China, September 23-26, 2008.
  3. Robert van Engelen and Wei Zhang. An Overview and Evaluation of Web Services Security Performance Optimizations.In Proceedings of IEEE International Conference on Web Services (ICWS), pages 137-144, Beijing, China, September 23-26, 2008.
  4. Robert van Engelen, and Wei Zhang. Identifying Opportunities for Web Services Security Performance Optimizations. In Proceedings of IEEE Congress on Services - Part I, Hawaii, July 6-11, 2008.
  5. Michael R. Head, Madhusudhan Govindaraju, Robert van Engelen, and Wei Zhang. Benchmarking XML processors for applications in grid web services. In Proceedings of SC’06 (Supercomputing): International Conference for High Performance Computing, Networking, and Storage, Tampa, FL, USA, November 11-17, 2006.
  6. Wei Zhang and Robert van Engelen. A table-driven streaming XML parsing methodology for high-performance web services. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 197-204, Chicago, IL, USA, September 18-22, 2006, (Best Student Paper Award).
  7. Robert van Engelen, Madhusudhan Govindaraju, and Wei Zhang. Exploring remote object coherence in XML web services. In Proceedings of IEEE International Conference on Web Services (ICWS), pages 249-256, Chicago, IL, USA, September 18-22, 2006.
  8. David A. Gaitros, Wei Zhang, Austin Mast, Greg Riccardi, and Fredrick Ronquist. A Biodiversity semantic associative annotation tool. In Proceedings of International Conference on Internet Computing (ICOMP'06), pages 29-35, Las Vegas, Nevada, USA, June 26-29, 2006.
  9. Wei Zhang and Robert van Engelen, TDX: a High-Performance Table-Driven XML Parser. In Proceedings of ACM SouthEast conference, 2006, pages 726-731.
  10. Robert van Engelen, Wei Zhang, and Madhusudhan Govindaraju. Toward Remote Object Coherence with Compiled Object Serialization for Distributed Computing with XML Web Services, In Proceedings of Compilers for Parallel Computing (CPC), 2006, pages 441-455.

Useful Links

  1. Advice on Research and Writing
  2. Online LaTeX Tutorail
  3. Wen Xue City
  4. Book Search and Price Comparison
  5. CS Home
  6. SCS Home