ANALYSIS AND OPTIMIZATION FOR PROCESSING GRID-SCALE XML DATASETS BY MICHAEL REUBEN HEAD BS, Harpur College, Binghamton University, 1999 BS, Watson School, Univ 1999 MA, Brandeis University, 2004 DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate School of Binghamton University State University of New York 2009c Copyright by Michael R. Head 2009 All Rights ReservedAccepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate School of Binghamton University State University of New York 2009 November 30, 2009 Madhusudhan Govindaraju, Department of Computer Science, Binghamton University Leslie Lander, Department of Computer Science, Binghamton University Michael Lewis, Department of Computer Science, Binghamton University Kenneth Chiu, Department of Computer Science, Binghamton University Fernando Guzman, Department of Mathematics, Binghamton University iiiAbstract In the field of Scientific Computing, two trends are clear: the size of data sets in use is growing rapidly and microprocessor performance is improving through increases in parallelism, rather than through clock rate increases. Further, Extensible Markup Language (XML) is increasingly being used to encode large data sets, and SOAP is being used to provide Grid services – uses XML and SOAP were never designed for, and na¨ıve implementations of these standards can ...