Dun & Bradstreet Embraces XML
XML middleware provides business-to-business solutions
by Jon Udell, XML Magazines
Harvey Bowring Credit Underwriting Unlimited, a London-based credit insurer, serves corporate policyholders who need to insure their trade with other companies against risk of bad debt and insolvency. Bowring delivers this underwriting service using a system that is entirely Internet-based, and highly automated. A customer accesses the Bowring Web site (www.creditinsure.com) and selects one of 17 million companies listed by Dun & Bradstreet. Bowring's NT server fires off an XML-formatted data stream from D&B, pumps that data into a risk-assessment matrix, computes the amount of coverage it can safely offer for trade between this customer and the target company, and then sells the policy. "Traditionally this is a longwinded process," says David Baker, Bowring's director. "An underwriter might take a couple of days to say you can or can't insure a company, and for how much. Using D&B to assess the risk, we have automated the process and literally reduced turnaround times from days to seconds."
This online-only business started in late 1997, based on another Dun & Bradstreet service called Direct
Connect. A European-only, X.25-borne data feed, Direct Connect delivered similar content-raw D&B credit-worthiness information-but was far less flexible. "With Direct Connect, D&B had to program the data feed for us," says Baker. "Global Access enables us to do that ourselves."
Reengineered Data, Reinvented Delivery Dun & Bradstreet has long been in the business of delivering business information keyed to its trademark D-U-N-S number, a nine-digit code that identifies more than 50 million companies worldwide tracked by D&B. The new Global Access platform advances the D&B technical infrastructure in two ways. First, it's a massive long-term initiative to rationalize the business information supplied by D&B subsidiaries and affiliates around the world. D&B has acquired many databases over many years and used different techniques for storing and transferring data. The data varies in quality and, what's worse, often means different things in different regions. Global Access defines a common vocabulary of data elements, and aims to map a vast array of data sources to that common vocabulary.
D&B customers such as Harvey Bowring can hardly wait. "At the moment we can take only US and Canadian data," says David Baker, "but as D&B link in Brazil or Japan or Australia we'll be able to automatically fit them into our applications."
Global Access also reinvents how D&B will deliver its data. Customers traditionally have bought packaged reports, not raw data, and the creation of a customized report has been a slow and cumbersome process. The new scheme turns the old one upside down. Global Access defines an inventory of data packages, which are clusters of fields related to D-U-N-S-listed companies. And it gives customers like Harvey Bowring a Java- or COM-based toolkit which they use to build D&B-aware applications that fetch and process raw data streams.
"In the toolkit we express the D&B business in an object model," says Tom Gwydir, project director for D&B's global technology organization. "Customers no longer have to go get a D&B report, read it, and make a business decision; now they can embed our business intelligence directly in their applications-including their ERP [enterprise resource planning] tools." This new focus on user-customizable data feeds as a primary deliverable is a "huge change for D&B," adds John Peterson, architect of the Global Access toolkit.
The difference is night and day. Historically a customized D&B business application required the development of a mainframe CICS data pull, which might take months to accomplish. Then the customer would have to create its own network link to D&B and hold up its end of a proprietary protocol. In the new scheme, D&B publishes a catalog listing a set of data products. The customer fetches the catalog over the Web, creates its own customized feed, and writes a little glue code to integrate that feed into an application.
At Harvey Bowring, this freed up development resources to focus on the real task: adding value to D&B's data. "It took us four or five months to fully develop our mechanisms to interact with the D&B data feed using Direct Connect," says David Baker. With the new Global Access feed, his team was
searching and analyzing the data in a few weeks. From a business rules perspective things took longer, but that's because Bowring used the new and more flexible access technology to analyze lots of D&B data more thoroughly than was previously possible. The result was a much-improved version of its
core asset, the risk matrix.
Trading Up from EDI to XML
D&B's Global Access system is one more example of what's become one of this year's hottest IT trends: recreating EDI (electronic data interchange) applications on an XML foundation. D&B is no stranger to conventional EDI; it's one of relatively few companies that have invested in it over the years. Why throw out what works?
Actually, D&B hasn't. On the content engineering side of the Global Access project, the EDI investment continues to pay off. "We've stolen a lot of the techniques we're using to rationalize the data from the EDI experience," says Tom Gwydir. But the EDI protocols are complex and cumbersome. And the VANs (value-added networks) used to deploy EDI applications are costly to create and maintain. What EDI lacks, in a word, is ubiquity. "We need to collect data from places like Uganda and Uzbekistan," says Gwydir, "and gee, with XML we can do that over the Internet with a thin protocol, without setting up VANS or our own IP backbones."
Gwydir admits that there's a tradeoff. The EDI discipline combines syntax and semantics. It defines how to structure the data, not just how to package and transmit it. With XML, he says, you only get syntax. "But our best EDI people have told us how to structure the data." He says D&B's EDI and XML efforts are coming together fruitfully.
XML-Based Toolkit Offers Flexibility, Ease of Use
Users can deploy D&B's Global Access Toolkit on a server or a client, as a set of COM components or Java libraries. Harvey Bowring's credit underwriting application, for example, was built with the COM version of the toolkit and interacts with IIS, Active Server Pages, and SQL Server on Compaq servers running NT. The same COM toolkit can be deployed client-side, delivering realtime D&B data-awareness to a Win32-based application. An alternative Java version of the toolkit delivers the same functionality on non-Windows servers and clients. Either way, the toolkit hides a lot of the XML plumbing that's needed to negotiate transactions with the Global Access Server. It exposes a scriptable object model (see "The hidden middle: Global Access Toolkit's object model") to the programmer.
Not an angle bracket or an XML parser in sight! Despite the object model's neat encapsulation, the XML machinery the toolkit hides isn't terribly complex. It's a request/response protocol called DGX (D&B Global Exchange), modeled on OFX (Open Financial Exchange), the Intuit/Microsoft standard that governs home banking (see Listing 2). OFX is currently based on SGML (Standard Generalized Markup Language), but is rapidly migrating toward XML. To create DGX, the Global Access team adopted OFX's SGML DTDs (Document Type Definitions) and recast them as simpler but equivalent XML DTDs.
Why bother with XML if it's all hidden from the user anyway? Several reasons. First, D&B doesn't lock users in to its toolkit. You can equally well talk to a Global Access server by opening up a socket and exchanging DGX-formatted messages, using any SSL-capable Web scripting tool to send requests and any XML parser to unpack responses. This is possible because XML-based middleware, unlike DCOM- or IIOP-based middleware, is fundamentally simple stuff. D&B isn't selling the toolkit, only the data. It made sense to embrace a radically open model that supports both official and unofficial modes of access.
Another reason to go with XML is that it's the native lingo of webMethods' B2B Integration Server, which sits in the middle of a three-tier architecture. Clients talk XML to the Integration Server, which in turn talks XML to Global Data Access (GDA), a hub with spokes that radiate out to all sorts of D&B databases. These include IBM mainframes in the US and elsewhere, VAXes in some northern European countries, and even Novell servers in some parts of the world. The protocols spoken between all these back-end data sources and Global Data Access are proprietary ones, with names like DUNS-LINK (North America), DART (Europe), and STRIDES (Asia/Pacific). But the B2B Integration Server at the heart of Global Access is both a provider and a user of XML interfaces.
The Global Access system comprises three tiers. Clients use the COM or Java toolkits, or straight XML, to talk to the B2B Integration Server in the middle tier. The Integration Server then acts as a client of Global Data Access, an umbrella service in the data tier that maps a hodge-podge of back-end data sources into a common set of XML-based interfaces.
There are a number these interfaces, because Global Access isn't just a massive data pump. Usage must be tracked and billed, and the mechanisms for doing these things are as diverse as the data sources themselves. Traditionally D&B has done billing country by country. In the new scheme there's a common billing interface, and a series of adapters that map the legacy systems to it. How? By every means imaginable: mainframe software, ASP pages, Java, you name it. "It all depends on what host we're fronting," says Peterson. "We have the whole world out there."
In most cases D&B directly controls the back-end systems and intervening middle-ware, and thus can guarantee delivery of well-formed and valid XML to the middle tier. It's just structured ASCII text, after all; any programming tool can produce it. But D&B doesn't always control the back end or the
middleware. What then? "We hope like hell somebody puts a Web site on top of it," says Peterson. In one pre-production situation where XML-formatted data wasn't available, he used webMethods' sophisticated HTML screen-scraping technology to map the HTML produced by that system into the XML interfaces defined in the Global Access middle tier. Because this approach doesn't require the explicit cooperation of the back-end system, it's a great way to solve the kinds of integration challenges presented by a heterogeneous mix of real-world systems such as D&B's.
The webMethods technology is based on WIDL, or Web Interface Definition Language, which is described in the W3C document at www.w3.org/TR/NOTE-widl. It's used to map Web-accessible services (expressed as statically or dynamically served pages) to other services exported by the Integration Server. Like the interface definition languages at the heart of other distributed-object systems such as DCE, DCOM and CORBA, WIDL names services and describes their inputs and outputs.
In screen-scraping mode, the B2B Integration Server implements WIDL interfaces natively. Listing 3 is a simplified definition of a Web-accessible service that might be used to look up a D&B company. WIDL encapsulates the URL and associated HTML form variables used to perform a lookup as well as the HTML result page produced by the form.
Once the WIDL service has executed a Web transaction and absorbed the results into its internal object model, how does it express the data? That's controlled by an output template that intermixes control logic (for example, "loop" íK "endloop"), bound variables (such as or ), and whatever kind of markup-typically HTML or XML-is required by the process or person that uses the service.
Usually the data-gathering services in the Global Access middle tier go against data sources that produce XML. But if necessary, they can use data sources that produce only HTML. The ability for the middle tier to intermix both modes when talking to the data tier, and to convert everything into a common set of WIDL-defined services that it exposes to the client tier, makes the webMethods system a powerful integration tool.
The Role of Integration Modules
The work of transforming back-end data into XML is largely handled by adapters bolted onto data sources. But the work of amalgamating those data sources and their associated services (usage, billing) into a set of common core services is handled by the B2B Integration Server. Its built-in WIDL processor isn't sufficient for this task. So D&B built a set of integration modules that are based on WIDL interfaces but implemented in Java and, in one case, Visual Basic.
The B2B Integration Server is a kind of Java Web server, and integration modules are like servlets that plug into that server. The validation module, for example, is a Java service that authenticates toolkit users against a SQL Server database. As an early adopter of the webMethods tools, D&B had to roll its own JDBC code to access that database. In version 2.0, webMethods added a generic JDBC service to its engine.
Like all of D&B's integration modules, the validation module began as a WIDL file that names an interface and defines inputs and outputs. That file was then fed to B2B Developer, webMethods' Java-based IDE (integrated development environment), in order to generate skeletal Java code which implemented the interface. Although B2B Developer typically derives WIDL by interactive exploration of HTML- and-CGI-based Web sites, D&B seldom utilizes that feature because Global Access is built on a foundation of formal XML interfaces, not on HTML screen-scraping. So the D&B team writes WIDL interfaces by hand, but uses Developer to jump-start the implementations of the corresponding modules.
A typical integration module exports a conventional Web-style CGI-like interface. In other words, you invoke it from a Web form that transmits URL-encoded name/value pairs to the service. These pairs are converted for the module's internal use into a Java Values object (an extended hash table). The module grabs the Values object, does its work, and modifies the Values object accordingly. Then the B2B Integration Server sends the object's data through an output template to create HTML or XML.
D&B's integration modules work a bit differently. They're driven directly by DGX-formatted requests. These don't map neatly to CGI-style name/value pairs. So when the Global Access toolkit hands a request to the validation module, it passes a single name/value pair whose value is a complex DGX request packet. The module then transforms the XML into a tree-structured object by invoking the same parsing engine used by the WIDL processor. Why do this? The webMethods kit is designed to support modules whose output is XML, not modules driven by XML. The webMethods marketing message is "XML everywhere," Peterson says, "but we're having to push the tool in the direction they're headed."
Integration modules can be written in Java, C/C++, or Visual Basic. Most of the Global Access services are Java-based, but the billing service-which uses the Microsoft Message Queue Server to transport XML-is COM-based. It's written in VB because VB interfaces nicely with Microsoft's message-oriented middleware. Here too D&B is pushing the envelope. A future release of the webMethods kits will offer a generic interface to message-oriented middleware, but the Global Access project couldn't wait for it.
Evaluating the Experience
Peterson's generally happy with the webMethods toolkit, and believes the project would have gone even more smoothly if there were a B2B Integration Server in the data tier as well as the middle tier. In webMethods' marketing lingo, that's rung 3 of a "ladder of integration" that goes like this:
1. Integration Server screen-scrapes HTML Web sites. Global Access can use this mode, but almost never does.
2. Integration Server interacts with XML Web sites. Global Access mostly operates in this mode.
3. Integration Server interacts with a partner Integration Server. This mode would facilitate realtime bidirectional data exchange.
Currently only the middle tier runs an Integration Server, and this was a midstream shift. Originally it had been based on Microsoft's IIS plus extension modules. Then D&B found webMethods and switched to it, retaining the high-level design while completely replacing its implementation. Peterson believes the result is both more stable and more configurable than the IIS-based solution could have been. "We like the model webMethods gives us to divide our world into services," he says.
The data tier, though, is fronted by an IIS server with extensions that talk to all the data sources. Going forward, there's a need to enrich the services provided by the data tier to the middle tier. The Global Access usage module, for example, doesn't link to the usage tracking performed by the various back ends; instead it creates and manages its own usage data. Likewise the validation module, which recreates customer data rather than linking to existing customer files. Peterson thinks an Integration Server in the data tier would be a good way to consolidate services it doesn't currently offer to the middle tier, but should.
Any complaints about webMethods? Most notably, the WIDL parser works only in DOM (document object model, aka tree-building) mode, which slows down the processing of large data feeds. Many XML parsers complement this method with a streaming mode that issues callbacks when each element is parsed. That's often a better fit for applications processing XML in a single pass that may not even need an in-memory DOM at all. "When a response contains many companies-I've seen as much as 3MB come down-we'd be happy parsing them one at a time," says Peterson. Until webMethods adds a streaming mode to its parser, as it says it will, Global Access will run slowly in this (relatively rare) situation.
The Global Access toolkit suffered from a different parser annoyance. Which one should it use to process the responses coming back from the middle tier? There are plenty to choose from, but no clear standard yet on the desktop. Internet Explorer includes one, but "we had performance problems with it," says Peterson, "and the interface to it keeps changing." In the end, D&B rolled its own XML parser for the toolkit. Which, as it turns out, Harvey Bowring doesn't use, because its team rolled another homegrown XML parser. This isn't as crazy as it sounds. A key design goal for XML was to ensure that parser construction would be a trivial task. That's especially true when its domain is limited, as is the case with a protocol like DGX. Once XML parsers are better established as standard system components, it'll be straightforward to adopt them.
Another glitch for the toolkit was the lack of a standard SSL-aware library. The native transport for Global Access is HTTPS (secure HTTP), useful because it's both secure and firewall-friendly. But although we take HTTPS for granted for interactive applications (because all browsers speak it), other applications don't. Applications running on Win32 systems with Internet Explorer installed can leverage WININET.DLL, but that's not available to Netscape users. And what about the Java version of the Global Access toolkit? The standard Java runtime doesn't include SSL capability either. The solution, for the Java-based version of the toolkit, was to deploy webMethods libraries that enable SSL communication.
But these are minor problems in an otherwise wholly successful scenario. The big picture is quite clear: XML is already delivering workable business-to-business solutions. And it's doing so in ways that leverage the ubiquity of the Web, while moving incrementally toward a model based on distributed, well-defined services. Forget DCOM and CORBA/IIOP? Maybe that's extreme, but you have to wonder what kinds of problems can't be solved with XML middleware.