May 23, 2004

Cocoon as a Web service framework

Last week I rediscovered some work done by Pankaj Kumar and others at HP on using Cocoon as a Framework for Web Services. The software for this framework (the Cocoon logicsheets for SOAP etc.) wasn't available on the website, but it would be interesting to compare it's performance with conventional Web service implementations.

One of the performance bottlenecks in Web services (see my previous post on Web service performance), is the cost of Java to XML and back to Java conversion- especially if you have a large, complex payload. Using something like Cocoon that can deal with XML natively (using XSL) should, in theory at least, be a better alternative. Especially if it is used as a SOAP intermediary that is not interested in the entire payload, but only parts of it, like a header or specific elements in the body, and doesn't want to serialize XML to a Java object to get at it.

[Update] I looked on the Cocoon website, and they have this interesting use of Cocoon for Portal Syndication using Web Services.

Posted by vivek at 10:25 AM | Comments (0) | TrackBack

May 20, 2004

GMail: Initial impressions

I have GMail!

Thanks to Ovidiu Predescu, I now am the proud owner of a GMail account! Since I have had the account for only a couple of hours now, I don't have a detailed review, but here are my initial impressions:

  • It's fast! The pages and emails load up with blazing speed- almost as if it were content hosted on my local machine, and not a web based email. This might be because it doesn't have a lot of users right now- but still, I'm impressed.
  • The UI is clean and minimal- just the way I like it. I have been a Hotmail user since, oh well '96? '97? and Yahoo Mail from when it started... and GMail has a cleanest interface around. Hotmail lately has a clunky, MSN-content-and-advertisement-saturated interface that I hate, and I rarely use it. That my Hotmail account is a spam magnet could be something to do with it too. Yahoo has a decent interface, but I dislike the image ads taking up almost one thirds of the page. In GMail the text ads are discreetly placed in the right column. They are noticeable though, however, I think its due to all the fuss about them that makes you look at the ads and actually read them!
  • You can attach 'Labels' to your messages. They are not quite the same as classifying them in Folders- an email can have multiple Labels. I don't see an option to create a 'Folder'- I guess I need to get used to the 'Label' thing.
  • There are keyboard shortcuts for composing email, next email etc.
  • Emails in a conversation are apparently grouped together. I don't have enough emails there to test this out yet.
  • Other than classification, there is email search- again, I don't have enough emails there to test this out properly. Given Google's search engine, I don't think I'll have complaints here. In addition to text based search, you can look for messages based on the From, To, Subject etc., just like in any non-web based email client.
  • You can create filters for your email based on the From, To, Subject, and keyword presence/absence. I like.
  • And yes, the 'You are currently using 0 MB (0%) of your 1000 MB.' I most certainly like.
  • Ok, now things that I would have liked to be different:
    • Ability to create an account name with less than six characters. I would like to have a short email address (my first name is five characters) for a change.
    • I miss an email notification mechanism, a la biff.. With Yahoo or Hotmail, if I leave the IM clients running I get notified of incoming emails. Maybe Google can add this to their Deskbar?
    • It would be nice to be able to populate the From/To fields in the search, and the To in the email compose field with email addresses in your contacts.
    • Spell check is useful. Prolonged use of Word has left me spelling impaired.
    • Sending email from the contact list pops up another window- it would be nice to not have multiple windows open up.
    • Being able to create folders and move emails to it would be nice. Yeah, I know Labels etc., but I like having an empty Inbox to look at, with all my email filed away neatly. Ok, I 'm anal, but that just the way I like things!

On the whole, Google has done a very good job with GMail. Thanks guys! And thanks again Ovidiu!

Update [One hour later]:I tried out the email grouping feature, and its neat! I stand corrected on the To in the email compose not getting populated- it does.

Update [One and half hour later]:Thanks to Ovidiu's weblog, I discovered GMail Gems.

Update [May 22, 2004]: Keyboard shortcuts are fast and convenient, and give a UNIX based email clients feel to it- especially the j/k keys for navigating up and down.

Update [May 23, 2004]: Spell check is available. Hmm.. was it always there and I didn't see it? Or was it recently added?

Posted by vivek at 10:52 AM | Comments (0) | TrackBack

May 19, 2004

Service-Oriented Architecture: A brief introduction

Folks, there is a new architecture in town- SOA!

What is SOA?

SOA, or Service-Oriented Architecture, is an architecture comprising

  • Loosely coupled services,
  • described by platform-agnostic interfaces
  • that can be discovered and invoked dynamically.

Loosely coupled refers to defining interfaces such that they are independent of each other's implementation. In a loosely coupled system, you should be able to swap-out one of the components and replace it with another and cause no effect to the system. [10] attempts to pin down the exact implications of loosely coupled systems.

The platform-agnostic interface means that a client on any platform (OS, language, hardware) can consume the service.

Dynamic discovery implies some kind of registry where these services are listed, and which allows lookup.

Why is SOA interesting to businesses?

SOA protects your investments in legacy applications. So whether you have a Java application, a .NET application or even a COBOL application running on a mainframe- all are equal in a SOA architecture. [5] explains some of the importance of SOA to businesses. [11] is a Gartner report that predicts business acceptance trends for SOA.

Is Web services a SOA?

Yes. In Web services, WSDL is the platform-agnostic interface and UDDI is the registry where services are published and discovered from. The invocation is via SOAP messages on a variety of transports (HTTP, HTTPS, SMTP, JMS, roll-your-own-transport).

However, the converse need not be true- you can have a Service-Oriented Architecture without Web services, or even XML.

Is CORBA/DCE/DCOM/RMI a SOA ?

All distributed computing technologies are have a concept of services, are defined by interfaces, and are platform agnostic. However, for a variety of reasons- some technical and others not- Web services are emerging as the standard way to do services. [9] explains some of the differences between Web services and traditional distributed computing technologies.

References:

[1] IBM developerWorks. New to SOA and Web services. http://www-106.ibm.com/developerworks/webservices/newto/

[2] Kishore Channabasavaiah, Kerrie Holley et al. Migrating to a service-oriented architecture, Part 1. http://www-106.ibm.com/developerworks/webservices/library/ws-migratesoa/

[3] Kishore Channabasavaiah, Kerrie Holley et al. Migrating to a service-oriented architecture, Part 2. http://www-106.ibm.com/developerworks/webservices/library/ws-migratesoa2/

[4] Sayed Hashimi.Service-Oriented Architecture Explained. http://www.ondotnet.com/pub/a/dotnet/2003/08/18/soa_explained.html

[5] Todd Datz. What You Need to Know About Service-Oriented Architecture. http://www.cio.com/archive/011504/soa.html

[6] Easwaran G. Nadhan. Service-Oriented Architecture: Implementation Challenge. http://msdn.microsoft.com/architecture/default.aspx?pull=/library/en-us/dnmaj/html/aj2soaimpc.asp

[7] Todd Datz. What You Need to Know About Service-Oriented Architecture. http://www.cio.com/archive/011504/soa.html

[8] Hao He. What is Service-Oriented Architecture? http://webservices.xml.com/pub/a/ws/2003/09/30/soa.html

[9] Werner Vogels. Web Services are NOT Distributed Objects. http://weblogs.cs.cornell.edu/AllThingsDistributed/archives/000120.html

[10] Doug Kaye. Loose Coupling is Like Pornography. http://www.rds.com/doug/weblogs/webServicesStrategies/2002/11/18.html#a726

[11] Yefim Natis, Roy Schulte (Gartner). Introduction to Service-Oriented Architecture. http://mediaproducts.gartner.com/reprints/bea_systems/114295.html

Posted by vivek at 11:35 AM | Comments (0) | TrackBack

May 16, 2004

Website update: Web service patterns

I've recently started exploring patterns for Web service usage, and have a new weblog for this purpose. Nothing earth-shattering there on the weblog right now, but as I experiment over the months and document my experiences (or share other people's experiences/experiments), I hope to have some interesting stuff there.

Posted by vivek at 11:11 PM | Comments (2) | TrackBack

May 14, 2004

Website update: Web service weblogs

I just added a list of weblogs on Web services.

Posted by vivek at 01:43 PM | Comments (0) | TrackBack

Keeping Design Simple

A post on Slashdot on comparing the design of the Indian EVM (Electronic Voting Machine) with Diebold's machine, highlights the importance of system design being kept as simple as possible.

Tim Berners-Lee has a short, but important list on his 'Principles of Design' webpage that re-iterates ideas that we all know, but often ignore:

  • Simplicity ("Keep it simple, stupid!")
  • Modular Design
  • Tolerance ("Be liberal in what you require but conservative in what you do")
  • Decentralization
  • Test of Independent Invention ("If someone else had already invented your system, would theirs work with yours? ")
  • Principle of Least Power

A complete list of design issues and architectural principles that guide W3C's thinking about Internet protocols (including Web service protocols) is here. It also has a proposed 'Roadmap for Web services'.

On a humorous note, a comment on the same Slashdot post - "These Indians are crazy- They build their own electronic voting machines, and outsource their Prime Ministers"- this being a reference to the Italian born Sonia Gandhi's party winning the largest number of seats in the Indian parliament.

Posted by vivek at 12:33 PM | Comments (1) | TrackBack

Competition is good

Close on the heals of Google moving on Yahoo's turf with free email with 1GB storage, Google Groups 2 (a Yahoo Groups like service- the original Google Groups is a web interface to Usenet groups) and image banner advertisements, comes Yahoo's announcement of new service features.

Yahoo mail will now offer 100MB to free email users (this was 6MB earlier, and then reduced to 4MB), and 'virtually unlimited storage' to paid email customers.

Other new Yahoo features include a tighter integration of Yahoo Mail with Yahoo Photos (users of Yahoo Photos will be able to e-mail photos to friends without having to click over to Yahoo Mail); Internet phone calling that is integrated with instant messaging and a deskbar that streams news headlines, weather and other information across a computer screen.

Competition is good! Besides, Yahoo started it first.

PS: An addendum to my earlier post on Google's response to privacy concerns- Google announced that it will have a for-pay version of its email service that is free of such intrusions. Maybe the proposed legislative measures had something to do with it too.

PPS: The Google-mania is reaching new heights as it's IPO draws near. Today I found a weblog dedicated to watching Google. Meanwhile, the official Google weblog is here.

Posted by vivek at 10:48 AM | Comments (0) | TrackBack

May 12, 2004

Registering and Discovering RSS feeds

Finding new and interesting RSS feeds is a problem, and so is advertising your own feed. I explored a few websites that aggregate feeds, and wasn't too satisfied with the experience.

Recently I found an article on Registering and Discovering RSS feeds in UDDI. Well I went and did just that at the IBM UBR node- here is my discover URL. I wrote up a small java program to browse the UDDI registry looking for RSS feeds, and didn't find a whole lot. The ones I did find were the really good ones though- like Don Box's Spoutlet, Karsten Januszewski's UDDI Weblog, Tim Ewald's Ideas about XML and Web Services ... and offcourse now, your's truly!

My java program for searching RSS feeds is here, and the setup instructions are here.

The RSS feeds I found (as on May 12, 2004) are listed below- many of them being weblogs of Microsoft folks.

Title URL
Don Box's Spoutlet RSS 0.9, RSS 1.0
Drewby.net RSS 0.9
Jazz 88 News at KSDS-FM.org RSS 0.9
Karsten Januszewski's UDDI Web Log RSS 0.9, RSS 1.0
Kirby T @thecave.com RSS 0.9,
Tim Ewald's Ideas about XML and Web Services RSS 0.9, RSS 1.0
Vivek Chopra's weblog on SOA and Web services RSS 1.0
Christian Weyer: Web Services & .NET RSS 2.0
Clemens Vasters: Enterprise Development & Alien Abductions RSS 2.0
Laurie Thompson-Earls RSS 2.0
Link to MSDN Just Published RSS 2.0
Matevz Gacnik's Web Log RSS 2.0
Michael Earls RSS 2.0

Not a whole lot of feeds as you can see. Check out Syndic8 if you want more RSS feeds- 100,772 at last count (May 12, 2004)!

[Update 10:30 PM, May 12 2004] I found this great resource on RSS which has a lot of information on registering and discovering RSS feeds too.

Posted by vivek at 03:58 PM | Comments (1) | TrackBack

May 11, 2004

Performance best practices for Web services

Performance is always a concern with Web services- there are better ways to designing a performant distributed system than sticking it behind a HTTP port and throwing verbose XML messages at it.

Lately I've been exploring some mechanisms to address performance issues for Web services. Most of these approaches target bottlenecks at the lower level SOAP layer, rather than at the design level. I've looked at some articles and technical papers that present metrics and guidelines. From them I've distilled the following performance best practices:

  • Design your Web service interface to minimize the network traffic. A 'coarse-grained' API is better, as you minimize the number of requests a client has to make to get information. [1][6]
  • Large SOAP messages are a performance bottleneck due to time spent parsing them. Keep your payload size as small as possible [1]
  • Complex SOAP message are a performance bottleneck due to time spent serializing/deserializing messages. Keep your payload complexity low. However, payload complexity and payload size are often design tradeoffs. [1]
  • SOAP intermediaries (gateways, proxies) should minimize parsing of messages.[1]
  • Better XML parsing techniques. [3] For most applications, event driven parsers (SAX style) are more performant than DOM style parsers.[1][6]
  • Document/Literal style SOAP messages are smaller and less complex than RPC/SOAP message. [1][5]
  • Security has performance costs. Not all SOAP traffic needs to be secure. The performance costs of an end-to-end security (i.e. WS-Security) is, in most cases, higher than a transport level security mechanism like SSL. [1]
  • Caching is a way to improve performance for processor-intensive services, though this is applicable only for read-only type of services. [2]
  • Many of the performance best practices for web applications will apply here too (using EJBs v/s JavaBeans, passing-by-reference of EJB components, Hardware and capacity settings, JVM setting etc.) [2]
  • Persistent connections are good for performance in case of a large number of messages of small payload size. For larger messages, this has less of an effect [3]. HTTP keep-alive is way [2] to request that a HTTP connection persist, though this is a default in HTTP/1.1.
  • Streaming connections are good for performance in case of a large payload size. HTTP 'chunked encoding' is a kind of streaming, and is supported by HTTP/1.1. [3][6]
  • Binary encoding of some payload elements should be considered. [3]

However, all said and done, remember the reasons for which you are choosing Web services - interoperability across heterogeneous environments- and not for performance!

References:

[1] Holt Adams. Web services performance considerations, Part 1. http://www-106.ibm.com/developerworks/library/ws-best9/

[2] Holt Adams. Web services performance considerations, Part 2. http://www-106.ibm.com/developerworks/webservices/library/ws-best10/

[3] Kenneth Chiu, Madhusudhan Govindaraju, et al. Investigating the Limits of SOAP Performance for Scientific Computing. http://www.extreme.indiana.edu/xgws/papers/soap-hpdc2002/soap-hpdc2002.pdf

[4] Madhusudhan Govindaraju, Aleksander Slominski et al. Requirements for and Evaluation of RMI Protocols for Scientific Computing. http://www.sc2000.org/techpapr/papers/pap.pap261.pdf

[5] Frank Cohen. Discover SOAP encoding's impact on Web service performance http://www-106.ibm.com/developerworks/webservices/library/ws-soapenc/

[6] Dan Davis and Manish Parashar. Latency Performance of SOAP implementations http://www.caip.rutgers.edu/TASSL/Papers/p2p-p2pws02-soap.pdf

[Update May 12, 2004] I found another website that consolidates performance best practices.

Posted by vivek at 11:32 PM | Comments (0) | TrackBack

May 07, 2004

Paypal's Web services

Another exciting new development in the industry adoption of Web services- Paypal announced a (beta) Web service API. This is the biggest announcement since, well the Google and the Amazon Web service APIs. Paypal is the de-facto standard for online payments, and so this is big, big news.

The Web services consist of four new informational and transactional APIs that enable developers to create e-commerce solutions and applications that integrate with the PayPal platform.

The API calls are:


  • TransactionSearch: Based on specified search criteria such as payment date or customer name, returns a set of matching transaction IDs and basic transaction details.

  • GetTransactionDetails: For a given transaction, returns all details associated with the transaction, such as customer email address, time of payment, and purchase details.

  • RefundTransaction: For a given transaction, reverses the transaction and issues a refund or partial refund to the purchaser.

  • MassPay: Transfers funds to one or many recipients by providing an automated alternative to cutting paper checks or

You need to sign up at PayPal Developer Central to access the API manuals, code samples and developer forums.

I'll do just that- watch this space!

Posted by vivek at 10:04 PM | Comments (0) | TrackBack

May 04, 2004

Website update: Web service implementations

I just finished adding a new web page on Web service implementations.

This is still work in progress, but it has a fairly exhaustive set of Security implementations listed; both hardware as well as software products. Its interesting how fast this area is growing. Web service security was a major concern for everyone just a year or so ago, now there are over a dozen products. I see a shakeout in the horizon!

Posted by vivek at 11:36 PM | Comments (0) | TrackBack

May 03, 2004

RSS/RDF: What the future may hold

I recently read a bunch of predictions on what RSS can do/will do in the near future to information- how we find it, and how we consume it.

Steve Gillmor in his article predicts that in 2004, RSS information routers will have the following features:
  • Persistent storage of XHTML full-text/graphics/audio/video of RSS feeds
  • XPATH search across local and Net stores
  • Self-forming and reordering subscriptions lists based on the aggregated priorities of user-chosen domain experts
  • Use of IM notification for post notification to aggregate affinity groups and active conversations
  • Integration of Hydra-like collaborative tools for multi-author conference transcripts
  • Videoconferencing routing and broadcast/recording tools
  • Integration of speech recognition and real-time indexing to allow quoting of linear audio and video streams
  • Mesh networked peer-to-peer synchronization engine for item propagation across shared spaces on multiple clients, including phones; iPods; and eventually Longhorn PDAs (circa 2006).
He goes on further to predict new applications that will emerge:
  • Metadata-driven directories that dynamically create RSS feeds based on affinity
  • Virtual conferences
  • IM/RSS presence networks for rich collaboration and e-mail replacement
  • Content-generation tools based on small, routable XHTML objects
  • A DRM network with enough creative and hardware support to blunt the Microsoft/RIAA DRM threat to peer-to-peer port hijacking.
Its almost the middle of 2004 now, and I dont see any of this happening yet, but that is not to say that it wont happen.

Another area where there a big opportunity for RSS/RDF is Knowledge management. Knowledge management is an significant problem in an organization. Valuable content is hidden away in individual silos- be they mail folders, group websites, CVS repositories, Shared directories or databases. What is needed is some what to mark up this content with metadata, some way to search for this content, and some way to consume it.

The first part of the problem- rich, interoperable metadata- is hard enough. RDF is the mechanism that the RSS uses to keep metadata about news items. RSS uses RDF to provide it a simple Ontology system (Ontology is a way of classifying something, such as in a hierarchy, and being able to infer a relationship with other things) for web resources. So you can describe an item using Title, author, subject, date published, keywords etc. However, RDF is far more capable than that, and can be used to describe more complex things too, such as Gene Ontologies.

You still have a problem of how to mark up data (an automatic classification v/s someone adding metadata manually), how to have a uniform way of classifying things within an organization, or across it; but that is a problem for these people to address! A nice, but slightly dated, discussion on Ontologies and Metadata can be found here.

However, once we do get data marked up, RSS is an excellent foundation to build on for technologies on how to consume it.

Posted by vivek at 04:24 PM | Comments (0) | TrackBack

I am an RSS junkie... what if everyone else become one too?

I have become a big convert to RSS (Really Simple Syndication/RDF Side Summary- take your pick) recently. My blogroll is 60 newsfeeds long- and I can finally keep up with all my interests.

My trusty RSS reader goes and fetches the latest newsfeeds every few hours or so (it's configurable). RSS newsfeeds and blogs remind me somewhat of the good old days of usenet, when I would wait for my newsreader to load up stuff from the various comp.lang/comp.unix newsgroups I had subscribed to. I stopped using usenet about six years back, when the 'noise' became intolerable, and I had to wade through a bunch of newbie questions to get to useful stuff. With RSS, since I select the feeds I subscribe to, its like a usenet newsgroup where I control the list of authors whose posts I want to see.

So what happens if everyone becomes a RSS convert like me? A recent Wired article discusses the effect on Internet traffic when this happens.

Already there are RSS readers that function from inside mail programs (Newsgator) or browsers(RSS Reader Panel, Aggreg8, NewsMonster)- and if this becomes a default 'feature' in IE/Outlook, it can drive a huge amount of traffic to a website. Unlike normal web browsing, a RSS reader visits a web site multiple times a day.

The article goes on to say what you can do about it- don't include a lot of content in the RSS feed- a headline, a couple of sentences and a URL is all that should be sent. If the reader is interested, [s]he can then visit the site for more information. This will help, but a bunch of buggy RSS readers (ones that dont check for changed content, or dont do it properly), or misconfigured readers (ones that look for changes very frequently) can still cause a lot of trouble. It would be interesting to see a solution to this, when it does become a real problem.

Posted by vivek at 02:55 PM | Comments (0) | TrackBack