07 December 2013

Taking It On The Road, Travel Technology 2013

I spent much of this year mountaineering in Europe, and re-visiting India after 34 years. Since my father's passing earlier this year, I've been free to travel again. My father was my chief technology influence. People often ask how I got into technology, since my education was in psychology and most of my career was in tourism. It was all due to my father, Lucian J. Endicott Jr.,who worked nearly three decades for IBM, and then became a professor of computer science before retiring altogether.

On this journey I've been watching closely to see what technologies I find most useful. Unlike most of the young people traveling today, I'm not traveling with a phone. I did have an Apple iPod Touch for awhile, which I enjoyed, but passed it on in favor of the new Google Nexus Android tablet. I find phones and tablets great for everything other than real work, like programming. I did buy the most economical, top rated Consumer Reports laptop for students, and have been very happy with it.

A man can only travel with so many devices though; so, the Apple iTouch and Android tablet both went to nieces, and I'm still happily traveling with my affordable laptop. In both Europe and Asia I have found locally available, prepaid "data cards" or "dongles", basically a phone chip on a USB stick, very helpful for freeing myself from dependence on wifi. However, some of the new, higher end phones come with built-in wifi "hotspot" capability, which I've seen quite a few young people using with their laptops. Without a phone per se, Skype has proven super convenient, especially premium Skype to local landlines and SMS, for literally calling from anywhere to anywhere.

I have to say that I use Skyscanner a lot, and feel it's saved me a tremendous amount of money. The only caveat is that some of the smallest new budget airlines are not included. For accommodation, I have tried both Couchsurfing and Airbnb for the first time this trip. I've actually found Couchsurfing more useful for meeting interesting, colorful people at my destinations than for easily organizing free overnights. I have also been surprised by the amount of people running accommodation operations under the radar via Airbnb, rather than truly private persons renting spare rooms, though I've been satisfied with the service.

Problematic as it may be, I do find myself using Wikitravel precisely because it provides less information rather than more. I like Wikitravel because it gives me a really quick overview of the high points, what to see and do, even for out of the way destinations. I sometimes download the entire page for offline reading, when there is no wifi available. I find both its strongest and weakest points are accommodation. Strong because anyone can add to it, so often gets places under the radar, and weak because it's totally unorganized and un-rated. Because of this deficiency, I find myself often referring to TripAdvisor reviews to double check the lower end accommodations.

Another relatively new technology I'm using a lot is Google Maps, of course frequently for directions, but also particularly for the "Search nearby" capability. I find Google Maps Search nearby capability delightful for discovering new places of all kinds, many perhaps never visited by tourists before. Especially in India directions should be taken with a grain of salt, because places usually seem to be "pinned" imprecisely, so caveat emptor. I find screenshots great for capturing Google Maps directions, easily cut and paste with the "prt sc" key for convenient offline reading.

More than ever before, travel for me is more about people than places. These days I love to visit with friends, old and new, at home or abroad. The reality is that people are now using Facebook more for personal communications than email. Facebook even makes it easier for people traveling in the same regions to meet up along the way. Another reality is that places, particularly low end places, are as likely to have Facebook pages as websites, essentially turning Facebook into its own parallel universe – no other Internet required. In fact, I only use Facebook for people, places/pages and events – and no other bells or whistles, such as innumerable travel apps, etc.

It was Facebook (and my nieces) that finally made me break down and get a small camera (Nikon Coolpix) for posting travel snaps as I go along, often the same day. Though that could be the number one use for phones I'm seeing on the road, not only for taking pictures but also for uploading them in virtual realtime….

04 September 2013

Dissecting the Summarization Process

This is in effect a mid-2013 progress update. As with many of my blog posts, this is as much a status update for me to get a better handle on where I'm at as it is to broadcast my progress.

mendicott.com is a blog reflecting on my journey with the overall project. This blog started seven years ago, in 2006, with my inquiry into The difference between a web page and a blog.... I had then returned from something like five years of world travel to find the digerati fawning over the blogosphere. At first, I failed to see the difference between a blog and a content management system (CMS) for stock standard web pages. Upon closer examination, I began to realize that the real difference lay in the XML syndication of blog feeds into the real-time web.

meta-guide.com is an attempt to blueprint, or tutorialize, the process. My original Meta Guide 1.0 development in ASP attempted to create automated, or robotic, web pages based on XML feeds from the real-time web. Meta Guide 2.0 development was based on similar feed bots, or Twitter bots, in an attempt to automate, or at least semi-automate, the rapid development of large knowledgebases from social media via knowledge silos. Basically, I use knowledge templates to automatically create the knowledge silos, or large knowledgebases. The knowledge templates are based on my own, proprietary "taxonomies", or more precisely faceted classifications, painstakingly developed over many years.

gaiapassage.com aims to be an automated, or semi-automated, summarization of the knowledge aggregated from social media by feed bots via the proprietary faceted classifications, or knowledge templates. Right now, I'm doing a semi-automated summarization process with Gaia Passage, which consists of automated research in the form of knowledge silos being "massaged" in different ways, but ultimately manually writing the summarization in natural language. This is allowing me to analyze and attempt to dissect the processes involved in order to gradually prototype automation. Summarization technologies, and in particular summarization APIs, are still in their infancy. Examples of currently available summarization technologies include automatedinsights.com and narrativescience.com. The overall field is often referred to as automatic summarization.

In the future, the Gaia Passage human readable summarizations will need to be converted into machine readable dialog system knowledgebase format. The dialog system is basically a chatbot, or conversational user interface (CUI) into a specialized database, called a knowledgebase. Most, common chatbot knowledgebases are based on, or compatible with, XML, such as AIML for example. Voice technologies, both output and input, are generally an additional layer on top of the text based dialog system.

The two main bottlenecks I've come up against are what I like to call artificial intelligence middleware, or frameworks, the "glue" to integrate the various processes, as well as adequate dialog system tools, in particular chatbot knowledgebase tools with both "frontend" and "backend" APIs (application programming interface), in other words a dialog system API on the frontend with a backend API into the knowledgebase for dynamic modification. My favorite cloud based "middleware" is Yahoo! Pipes, which is generally referred to as a mashup platform (aka mashup enabler) for feed based data; however, there are severe performance issues with Yahoo! Pipes -- so, I don't really consider it to be a production ready tool. Like Yahoo! Pipes, my ideal visual, cloud based AI middleware could or should be language agnostic -- eliminating the need to decide on a single programming language for a project. I have also looked into scientific computing packages, such as LabVIEW, Mathematica, and MATLAB, for use as potential AI middleware. Additionally, there are a variety of both natural language and intelligent agent frameworks available. Business oriented cloud based integration, including visual cloud based middleware, is often referred to as iPaaS (integration Platform as a Service), integration PaaS or "Integration as a Service".

The recent closure of the previously open Twitter API with OAuth has set my feed bot, or "smart feed", development back by years. Right now, I'm stuck trying to figure out the best way to use the new Twitter OAuth with Yahoo! Pipes, for instance via YQL, if at all. And if that were not enough, the affordable and user-friendly dialog system API, verbotsonline.com, that I was using went out of business. There are a number of dialog system API alternatives, even cloud based dialog systems, but they are neither free nor cheap, especially for significant throughput volumes. Still to do: 1) complete the Gaia Passage summarizations, 2) make Twitter OAuth work, use a commercial third party data source (such as datasift.com, gnip.com or topsy.com), or abandon Twitter as a primary source (for instance concentrate on other social media APIs instead, such as Facebook), 3) continue the search for a new and better dialog system API provider.

Most basically, the Gaia Passage project is a network of robots that will not only monitor social media buzz about both the environment and tourism but also interpret the inter-relations, cause and effects, between environment and tourism -- such as how climate change effects the tourism industry both negatively or positively, or even what effects the weather has on crime trends for a particular destination -- as well as querying these interpreted inter-relations, or "conclusions", via natural language. If this can be accomplished with any degree of satisfaction, either fully automated or semi-automated, then the system could just as easily be applied to any other vertical. Proposals from potential sponsors, investors, or technology partners are welcomed, and may be sent to mendicot [at] yahoo.com.

13 March 2013

A New Website For A New Age: GaiaPassage.com

GaiaPassage.com is subtitled "Marcus L Endicott's favorite tips for green travel around the world".  I'm calling it a deep green, eco-centric travel guide to the whole Earth.  My Gaia Passage project will be a handwritten ecotourism guide to the entire world, based on the circa 250x ccTLD.  The general idea is to write a "white paper" for every country in the world, on environmental and cultural conditions, issues, and who is doing what about them, as well as examining both how they affect tourism and how tourism affects those issues. Anyone could write a lot about something, but the idea here is to provide "snapshots", or "bite sized" summaries, of only the best information and contacts.  The name "Gaia Passage" originally came from my pre-Internet (mid-1980s) travel tips newsletter. The site is a work in progress; so far, I've completed the entire Western Hemisphere:
GaiaPassage.com is handwritten, but based on automated research and automated outline. Primary research is based on data mining 20 years of Green Travel archives. Secondary research is based on multiple years of Meta Guide Twitter bots archives. Significance is based on primary sources in the form of root website domains, and/or secondary sources in the form of Wikipedia entries. In other words, if there is not a root website domain name or a Wikipedia entry then it is unlikely to be included. (However, almost anything may be included in Wikipedia - if properly referenced.) 

I have noticed that many websites of smaller concerns are going down, offline, apparently due to the economic downturn. However, social media such as Twitter and Facebook do present affordable alternatives to owning a root domain website, and I will take these into consideration when appropriate. (In other words, when something is really cool.)  I have also noticed a lot of people using Weebly to make free websites. (Note, GaiaPassage.com currently uses the free Google Sites platform.)

In the early evolution of a website, especially large projects, it's important to first have the "containers" in place as "placeholders", which is no small task in itself. With circa 250x countries and territorial entities, that's a whole year's fulltime work for one man, revising one country per working day. This would mean initial completion by December 2013. Eventually, GaiaPassage.com entries may morph into socialbots, or conversational assistants, containing not only all the knowledge about sustainable tourism gleaned from past Green Travel archives, but also current knowledge resulting from the Meta Guide Twitter bots.

In my previous blog, 250 Conversational Twitter Bots for Travel & Tourism, I detailed my 250x Meta Guide Twitter bots, one for every country and territory in the Internet ccTLD.  Basically, I've spent the past five years working on artificial intelligence and conversational agents - and tweeting about it all the while (links below).  I had been using Twitter extensively as a framework; however, Twitter has become increasingly protectionistic, most dramatically illustrated by the high profile 2012 Twitter-LinkedIn divorce. The Twitter API has become a moving target, which is just too costly for me to keep playing catch up.  In short, I find the "Facebook complex" of Twitter management immensely annoying, and concluded to stop contributing original content; so, my New Year's resolution was to stop tweeting manually at least for all of 2013.  Further, my excellent dialog system API, VerbotsOnline.com, went out of business in 2012.  Any other good dialog system API I found to replace it turned out to be much too expensive.  As a result, all my conversational agents are shut down, at least for 2013.  My hope is that the sector will shake out and/or advance during the year, and better or at least more affordable conversational tools will become available next year.

19 June 2012

250 Conversational Twitter Bots for Travel & Tourism

The reason I haven't updated this blog in almost a year is that I have moved most online development to my Meta-Guide.com website. In the previous two postings, I began testing my content repatriation strategy, in other words aggregating my own content from around the web, which I've continued on the Meta Guide website, in fact concentrating on seeding new webpages from mining the past four years of my own tweets. I have also made a prototype summarizer, which I am now training on my Meta Guide website in order to extract content from it to add on top of the mined tweets when building out new webpages. At the moment, I have three immediate goals. I would like to reach 10,000 tweets, 1,000 Meta Guide webpages, and 100 theses in AI & NLP (from the past 10 years). I only have about another 3,000 tweets to go, so maybe another year, about 300 webpages left to make, and less than 30 more theses to discover.

This past weekend, within view of the spectacular Colorado Rocky Mountains, I succeeded after some struggle in making my 250x Meta Guide Twitter bots conversationally interactive on Twitter. These are 250x manually constructed Twitter bots, one for every country, based on country code top-level domain. That includes one for each of the 193 member states of the United Nations, plus an additional 57 various and sundry territories included in the ccTLD. All of these Meta Guide Twitter bots are powered by my @VagaBot, a single cloud-based Verbot engine from Verbotsonline, using the undocumented API and connected to Twitter via Yahoo! Pipes. Previously they have just been retweeters, aggregating country-specific travel and tourism tweets. The next phase of development will involve marrying the incoming retweets to the outgoing responses in some meaningful way, in other words datamining the incoming retweets and attempting to process them semantically into answers.

You should now be able to @sign tweet any of the Meta Guide Twitter bots with questions. Currently, message turnaround time is running up to 30 minutes, but which is par for Twitter. Among other things replies contain lines from my travel books, see Vagabond Globetrotting 3 & From the Balkans to the Baltics. If you are interested in learning more about me and what I do, I recommend watching both Part 1 & Part 2 of my recent videos on "Open Chatbot Standards for a Modular Chatbot Framework", presented in Philadelphia at Chatbots 3.2: Fifth Colloquium On Conversational Systems. If you need help with socialbots for your social CRM, I am available for consulting; just check my Contact page for details, follow me on Twitter, or connect on LinkedIn, and let's Skype!

12 July 2011

My own posts to the Robitron group since 2008

The Robitron discussion list is a Yahoo! Group started by Robby Garner in 2002 (hence the name "Robi-tron"), that has not only become the de facto Loebner Prize feedback channel, but also functions as the online "water cooler" for Turing-class chatbot developers. Archives of the Robitron group are only available to group members. The following is a reverse-chronological listing of my own posts to the Robitron group since 2008, to date.


Thanks for your questions. I think your position is not different from many if not most people. It took me a long time for the significance of blogs to sink in. As you have identified, its the feed that's the distinguishing factor.

It is extremely easy to automate a Twitter account using services like http://twitterfeed.com and http://dlvr.it . You just stick your feed in, and voilà. For instance, its my belief that not enough chatbots are taking advantage of chat log feeds; in fact, the only one I know for sure who was using it was Liz Perreau's ShakespeareBot (apparently offline at the moment).

Twitter is super easy to mix and match; because, you can just as well manually tweet from the same account that you've automated with one or more feeds, so very versatile. In fact, there are myriad services available to further automate and semi-automate tweets and replies in various ways (often for corporate use).

Using a push/pull model, that would cover push, and "datamining" Twitter would then come under pull. There are MANY applications and services available for "datamining" Twitter, and Twitter itself is beginning to buy and try to integrate some of them.

A Twitter account actually consists of two basic feeds: 1) your postings to Twitter, and 2) all the postings of those you're following. The way "following" works on Twitter allows you to "build" your own highly individual and unique
feed. Technically, Twitter is a microblog network, and so these "one liner" status updates can be thought of as a kind of extension of blogging. Similar to blogging, people may not be "talking" to others, but simply "twittering" or posting their thoughts; therefore, Twitter can also be thought of as a "thought network". Part of the brilliance of the 140 character SMS-compatible status line is that it conforms more or less to one average sentence.

I've gotten to the point now where I'm not actually reading the feed of people I follow on Twitter, but have instead finely tuned my Twitterbots to return only that "datamined" info that conforms most closely to my interest. (I'm actually using http://summify.com to periodically summarize and prioritize the feed of people I follow on Twitter.) I use Facebook for internal, strictly for personal; I know every single friend face to face. Whereas, I use Twitter for external, outwardly facing, and potentially professional contact.

One man's trash is another man's treasure


This is an all too common misunderstanding of Twitter by novices, and worth taking just a minute to try and clarify here. Its becoming clear to me that people who are good at programming AIs, don't seem to waste a lot of time on social media. Take a champion like Bruce Wilcox for example, I really miss some focussed "lifestreaming", personal blog, or at least some kind of up to date "homepage"; but obviously, he's got other priorities.

The key to understanding Twitter is that its NON-LINEAR, unlike conventional discussion fora. Its like the mother of all knowledgebases, or something like one big CHATBOT. Twitter is a combo of ARMIES of Twitterbots churning away, plus a mechanical turk of something on the order of 200 million souls. I do think Robert Medeksza "gets it", as his http://twitter.com/UltraHal has apparently been LEARNING from Twitter for some time.

Two final points: 1) perhaps the main point of twittering is just good old fashion SEO, 2) Twitter is the Internet's principal, and infinitely customizable "feed exchange" or "feed interchange" (which is probably why Apple decided to integrate Twitter into their latest operating system upgrade).

Best of luck


That's cool, but still no way for punters to test drive Chip Vivant?

BTW, is it really ChipVivant or Chip Vivant, one word or two? ;^)

FWIW, twittering is a superb way to get this kind of thing out into the sea of data.


Have you "blogged" recently a complete list (overview) of domains and projects that you're involved with ?!?

> agent technology, metadata and crowd sourcing

Good day Amanda,

Just a few thoughts to get going.. I recently saw a great AiGameDev.com online interview with Bruce Wilcox about his Chatscript, which more or less covered the same ground as the online article, "Beyond Façade: Pattern Matching for Natural Language Applications" http://bit.ly/fs63c9 .. I really appreciated the clear explanations.. I didn't know anything about Facade previously.. Anyway, toward the end of the interview Bruce mentioned something like the next level of his development involved figuring out how to analyze books to extract things like personality or personal knowledge, perhaps like converting novels into chatbots.. This would almost certainly need to be done using metadata.. The point of the Facade angle seemed to be getting beyond keywords and phrases into mapping concepts with ontologies, etc.. One of the most common forms of crowdsourcing that comes to mind is reCAPTCHA (developed at CMU).. For example, what if every school child in the US were to enter their interpretations of
novels into some machine readable form? In fact, school kids are now already using Oddcast Voki.com talking avatars to imitate historical figures..

You're not on Twitter?


Here´s one example of the new wave of voice operated chat agents hitting the marketplace:

Speaktoit Virtual Talking Assistant

Its an Android app by http://www.speaktoit.com , apparently using the undocumented Google speech recognition API.

Of course, what I´d really like to see are such apps where I can plug any chatbot engine into the backend (via XMPP), such as Turin or any Loebner contender, and simply talk to it in the same way.

Something like "Open Chatbot Standards" might allow for further modularity in the form of an infinite array of avatar variations, not to mention freely pluggable voices, accents and recognition APIs.

I´ve been mostly offline the past four months traveling in South America, and am still in Buenos Aires today. I´m thinking about standards for chatbot commercialization, and wanted to let this brainstorm fly for any potential feedback on this subject.

Currently, I´m seeing three levels of products:

1) Intelligence (chatbot engines)

2) Avatars systems

3) Interactive speech technologies (TTS + STT)

My belief continues to be that XMPP is the lingua franca for chatbots, allowing them to communicate with other networked systems, including avatar systems, and indeed one another.

I don´t know of many "modular" turnkey avatar systems yet. I suppose there is SitePal, and various SecondLife products could be considered similar. I understand Zabaware is working on something like this. I´m assuming some level of lipsync built into avatar systems, or not.

Of course, the speech technologies are in great flux right now, particularly regarding web service APIs. How they may shake out is anyone´s guess at this point, but I can imagaine XMPP also being involved at this level, potentially for interfacing modularized avatar systems.

Somehow this notion of a standard "modularity" in this area seems to open a lot of ground for the participation of an even wider array of industries in the overall effort.

(I won´t even get into the potential convenience of XMPP for interfacing "intelligences" with future hardbots, or physical robots, at this point.)

That´s it for now! ;^) All feedback, both positive and negative, much appreciated!

Just a quick follow-up to point Robitroners to my latest blog post attempting to reverse engineer IBM Watson ..

Marcus L Endicott: How Many PlayStations Make A Watson?

Free Watson - IBM DeepQA test subject denied basic human rights

> [ http://twitter.com/statuses/user_timeline/226793352.rss ]

I've got an intelligent retweeter online above for those who would like to follow the buzz.

> [ http://twiterlist2rss.appspot.com/mendicott/lists/chatbotters/statuses.rss ]

It is part of my Twitter List above if you want to follow the broader community.

Seasons greetings from snowy Colorado


Both you and Hamilton may wish to look at the "China Brain Project" being developed by Hugo de Garis and Ben Goertzel at Xiamen University Artificial Brain Lab http://ai.xmu.edu.cn/artificialbrain ..

Video ~ The China-Brain Project: Report on the First Six Months

PDF ~ The China-Brain Project: Report on the First Six Months

Ben Goertzel's other company http://biomind.com has a partner in Brazil, http://vettalabs.com based in Belo Horizonte.. Ben was born in Rio de Janeiro.. His father is a well known scholar of Brazilian culture..

Então, é bom ter ligação Brasileiro aqui na lista.. ;^)

After a cursory scan about semantic primes (aka semantic primitives), a number of things spring to mind..

1) pattern-matching AI

2) IVR grammars

3) conlangs or constructed languages

Semantic primitives may be the semantic equivalents of word "stemming", perhaps a kind of "semantic stemming"..

Like the utility of stemming in search, this "semantic stemming" might be employed in the semantic enhancement of pattern matching in AI..

Lately, I have been struck by the similarity of IVR grammars to basic AI pattern matching.. It seems to me there may be room for a much closer hybrid of the two..

The similarities between these various reductions and the constructed languages seem inescapable, which leads me to wonder what role the conlangs might play in aid of the semantization of AI..

Thanks Hugh, this lead me (via citation) to the interesting work John Barnden has done with the E-Drama Project using WordNet to look into affect detection by metaphor in AI actors:

>>The Affective Norms for English Words (ANEW) provides a set of normative emotional ratings for a large number of words in the English language. This set of verbal materials have been rated in terms of pleasure, arousal, and dominance to complement the existing International Affective Picture System (IAPS, Lang, Bradley, & Cuthbert, 1999) and International Affective Digitized Sounds (IADS; Bradley & Lang, 1999), which are collections of picture and sound stimuli, respectively, that also include these affective ratings. The ANEW is being developed and distributed by the NIMH Center for Emotion and Attention (CSEA) at the University of Florida.<<

Hi Rob,

How far do you consider valence annotation from "sentiment analysis"?

I've been watching IBM's forray into sentiment analysis with their recent purchase of SPSS.

Doesn't FrameNet (http://framenet.icsi.berkeley.edu/) include valences?

Esteemed Robitroners..

I have arrived at the conclusion that IM-XMPP/Jabber will become the universal transport mechanism for conversational agents..

Therefore, I am searching for a bidirectional IM-Voice gateway of any kind.. in order to make *ALL* AI chatbots fully voice-interactive..

I am also searching for a Windows7-compatible desktop avatar (talking head) frontend, which can easily accept *ANY* IM-XMPP/Jabber backend..

Any pointers or suggestions would be most appreciated!

Here's what WolframAlpha answered when asked "how are you doing?" => http://twitpic.com/1pkb3p/full

And, here's a video demo of Siri => http://www.youtube.com/watch?v=dIWbbotLVds

There are 468 members on this Robitron YahooGroup, with maybe a dozen regular posters and another dozen periodic posters, which says to me that a lot of folks are paying close attention to what goes down in this group.

I have seen ample evidence in the popular literature that blackhat chatbots have talked plenty of people out of their personal details, not to mention sexbots talking their way into people's private lives.

Conversational agents are a form of search, certainly in the case of pattern-matching AI. Search and conversational agents are rapidly moving toward convergence, just look at WolframAlpha, widely considered a hot forerunner in the new wave of semantic search.

What is even bigger, the conversational interface has been predicted to become the next BIG technological disruption.

Hello? Who here is going to tell Apple that Siri is not $200 million worth of credible?


Have you looked at Google App Engine (http://appengine.google.com/)?

It's all about Python and Java..

Only issue seems to be their "BigTable" non-relational database..

Helio Perroni Filho has told me that his Chatterbean Java AIML interpreter could be used on AppEngine with some modifications..

I haven't heard of anyone yet attempting to use AIML with the AppEngine flat file database..

(If anyone could provide a concise critique of the difference between relational database and flat file database within the context of pattern matching AI, I would certainly welcome it..)

Thanks to David Levy's win.. I'm now happily following Huma Shah on Twitter at http://twitter.com/Turing100in2012

I've been following Dr Wallace at http://twitter.com/drwallace

8pla.net is there at http://twitter.com/8planet

Even Robby Garner is on Twitter at http://twitter.com/robitron

I would love to follow other Robitroners on Twitter.. Perhaps others could respond to this message with their Twitter link, making a defacto list of Robitron Twitterers..


BTW, I've got a little hack at http://twitter.com/robitron_list which alerts me to *new topics* on this Robitron group, without links.. Other Twitter power users are welcome to follow it too..


FYI, as far as the "Twuring test" is concerned, I've now got two different chatbots on Twitter, a Pandorabot at http://twitter.com/twaveladvisor AND a Conversive Verbot at http://twitter.com/vagabot

I'm parsing travel questions off the top of Twitter and sending the same stream into both bots.. Importantly, I'm replacing all @ signs with # hash in order to prevent annoying people by sending replies into their Twitter inboxes..


And, I'm still Twittering heavily myself about #chatbots and the coming #VoiceWeb at http://twitter.com/mendicott

Cheers to all, and especially Daivd Levy for his success, from tropical Queensland!


This is detailed in my blog post "Corpus linguistics & Concgramming in Verbots and Pandorabots" at http://tinyurl.com/69xw9t . Pay particular attention to the comments following the post. The missing piece was a custom process done in SPSS by a statistical programmer friend of mine as a personal favor. For more info on concgramming, I suggest tracking down a copy of "From n-gram to skipgram to concgram" http://tinyurl.com/4hl3ow . I have been in contact with one of the authors, Chris Greaves, and just asked him for an explanation of the differences between concgramming and latent semantic analysis/latent semantic indexing.

BTW, I've just Twittered about shakespearebot.com, and love your RSS out feature; can you point me to other bots doing this?

Esteemed Robitron members :-)

I've been following Robitron for some months now, and would like to introduce myself. Some of you are already familiar with my work via the pandorabots-general list. (I like the pandorabots-general group; because, its low pressure and handles a lot of clueless questions gracefully.) So far, I've actually been more involved with Conversive Verbots than Pandorabots, mainly because of the ease (and cost) of the integrated graphical interface. However, I am in the process of moving to Program-E.

I am not interested in even trying to pass the Turing test, however do appreciate the Loebner Prize stimulus to development, along the lines of the Paris Dakar Rally. I am currently resonating with John Smart's expression of the "Conversational Interface" or "Conversational User Interface (CUI)". I simply want a conversational agent that works in
a practical way along one vertical, in my case "green travel".

I want my bot to contain all the knowledge in my book, Vagabond Globetrotting 3, and to acquire knowledge from all the web feeds underlying my www.meta-guide.com site, or in other words, to be able to "read" books and "learn" from RSS feeds.... I have already converted my book into AIML, and am currently working on a model to convert from RSS into AIML via semantic technologies; however, I would actually prefer not to reinvent the wheel, and use off the shelf

It seems to me at this point the real bottleneck is with voice input. I don't know of any web site that actually accepts speech recognition through the web. It's not clear to me why I couldn't simply talk to a web site via VoIP, Skype for instance. I have played around with Windows speech recognition, which interfaces well with desktop Verbots. In this case, I'm looking more for cloud solutions.

Lately, I've been heavily twittering about the convergence of chatbots with semantic technologies at http://twitter.com/mendicott . I've only actually known about Twitter for some months and am finding it intriguing, now referring to it as a "thought network", very neuronal. I can even imagine a chatbot knowledgebase fed with Twitter feeds; the 140 character limit seems perfect for bot responses....

Good day from Sydney

06 July 2011

My Cleverbot Tweet-FAQ

This is an experimental "Tweet-FAQ", a cummulative listing of my tweets, microblog postings to Twitter, to date about the chatbot Cleverbot and its sister chatbot Jabberwacky, their creator Rollo Carpenter, and his companies Icogno and Existor.

  • According to Slashdot, Oct 2010, Cleverbot had 45 million lines of memorized user chat, at a rate of doubling every year http://t.co/CRfUpqG
  • http://existor.com .. "conversational AI for business, education and entertainment" .. @existor .. founded by Rollo Carpenter in 2008 ..
  • Not impressed w/ http://cleverbot.com/app "Cleverbot HD" ($2.99) interface.. "emotional avatar" is lame.. needs animated avatar w/ voice-io
  • Version 1.2 sees Cleverbot renamed Cleverbot HD http://tinyurl.com/32xxu74 .. Cleverbot iPhone / iPad app requires WiFi ($2.99) .. #Icogno
  • So I asked Cleverbot.com .. "Are you Bayesian?" .. and it replied "Yes" ..
  • http://liveenglish.ru .. George Jabberwacky teaches Russians English .. first simulator of spoken English .. whole day access only 39 rubles
  • "Learning, creating, phrasing" By Rollo Carpenter, 25th March 2010, Third colloquium on conversational systems http://tinyurl.com/ygbqyyf ..
  • Jabberwacky Cleverbot http://cleverbot.com "learns to be clever from real people, and its AI can 'say' things you may think inappropriate"

24 January 2011

How Many PlayStations Make A Watson?

"The words are just symbols to the computer. How does it know what they really mean?" - David Ferrucci

IBM Watson is IBM's project to create the first computer that can win the TV quiz show Jeopardy when pitted against human contestants, including the record holder for the longest championship streak, Ken Jennings and the current biggest all-time Jeopardy money winner, Brad Rutter. The resulting computer will be a contestant on Jeopardy next month, February 2011. I will try to give an overview here of what is known to date about IBM Watson from open sources. I'm writing this as much for my own learning as any other reason; so, give me a break if it gets a little fuzzy in the complicated parts, besides IBM is not playing all their cards. Of course, I welcome any and all comments, corrections and clarifications. BTW, in the spirit of full disclosure, I am a so-called "IBM brat" having grown up in an IBM family; my father @ljendicott worked for IBM, 1960-1987.

According to the IBM DeepQA FAQ, the history of Watson includes both "Project Halo", the quest for a "digital Aristotle", and AQUAINT, the Advanced Question Answering for Intelligence program. In fact, David Ferrucci, principal investigator for the DeepQA/Watson project, has four publications listed in the AQUAINT Bibliography. The earliest version of Watson was a trial of IBM’s AQUAINT system called PIQUANT, Practical Intelligent Question Answering Technology, adapted for the Jeopardy challenge. Another question answering system, a contemporary of PIQUANT called Ephyra (now available as OpenEphyra), was used with PIQUANT in early trials of Watson, both by IBM and their partners at the Language Technologies Institute at Carnegie Mellon University (who are jointly developing the "Open Advancement of Question Answering Systems" initiative). One of the things OpenEphyra can do that Watson doesn't do at the moment is retrieve the answers to natural language questions from the Web.

IBM Watson is not a conversational intelligence per se, but rather a question answering system (QA system). It is fully self-contained and not connected with the Internet at all. Watson does have an active Twitter account at @IBMWatson, but it is operated by a group of Watson's handlers (using CoTweet). Watson has no speech recognition capability; questions are delivered textually. It does not have autonomous text-to-speech (TTS) capability: TTS must be triggered by an operator (ostensibly to avoid interruptions to the television performance). New York Times readers have tentatively identified the voice of Watson TTS as that of Jeff Woodman. Presumably, an IBM WebSphere Voice product is being used for Watson TTS.

The distinctive "face" or avatar of IBM Watson, about the size of a human head, expresses emotion and reacts to the environment. It was created by Joshua Davis with Adobe Flash Professional CS5 using the ActionScript HYPE visual framework deployed via Adobe Flash Player 10.1. The avatar is connected to Watson via an XML socket server, which sends information about the computer’s current mood or state, such as “I know the answer”, “I won the buzz”, etc. The avatar also receives audio input from Watson’s voice by analyzing audio from the microphone.

IBM Watson is built on a massively parallel supercomputer. The hardware configuration consists of a room-sized system, about the size of 10 refrigerators: 10 racks containing 90 IBM Power 750 server clusters connected over a 10 Gb Ethernet. Each Power 750 contains 4 chips and 32 cores, and is supposedly the world's fastest processor. IBM Watson has a total of 360 computer chips and 2,880 processor cores. It has 15 terabytes of RAM, and a total data repository of 4 terabytes, consisting of two 2 terabyte (TB) I/O nodes. IBM Watson operates at some 80 teraFLOPS, or 80 trillion operations per second. (For comparison, both IBM Blue Gene and the AFRL Condor Cluster operate at some 500 teraFLOPS.)

Many sources, including the Wall Street Journal, are claiming Watson's 4 terabytes (TB) of storage contains some 200 million "pages" of content. Wired claimed only 2 million pages of data for Watson. 1TB (or 1,024GB) is roughly equivalent to the number of books in a large library (or about 1,610 CDs). Large municipal libraries may contain an average of 10,000 volumes. So, if a book averaged say 200 pages, then Watson should contain closer to something like 8 million pages of content. Content sources include unstructured text, semistructured text, and triplestores.

Watson's software configuration consists basically of SUSE Linux Enterprise Server 11, Apache Hadoop, and UIMA-AS. SUSE Linux Enterprise Server 11 is a Linux distribution supplied by Novell and targeted at the business market. Apache Hadoop is a software framework that supports data-intensive distributed applications, including an open source version of MapReduce, enabling applications to work with thousands of nodes and petabytes of data. UIMA-AS (Unstructured Information Management Architecture - Asynchronous Scaleout) is an add-on scaleout framework supporting flexible scaleout with Java Message Service. Hadoop facilitates Watson's massively parallel probabilistic evidence-based architecture by distributing it over the thousands of processor cores.

The DeepQA architecture has three layers: natural language processing (NLP), knowledge representation and reasoning (KRR), and machine learning (ML). The IBM Watson team used every trick in the book for DeepQA; apparently they couldn't decide which natural language processing techniques to use, so just used them all. Each one of Watson's 2,880 processor cores can be used like an individual computer, enabling Watson to run hundreds if not thousands of processes simultaneously. For instance, each processor thread could host a separate search. All the hundreds of components in DeepQA are implemented as UIMA annotators. The internal communications among processes is handled in UIMA by OpenJMS, an open source version of Java Message Service. The IBM Content Analytics product LanguageWare is used in Watson for natural language processing. According to David Ferrucci, Watson contains "about a million lines of new code".

Processing steps: (1) Question Analysis -> (2) Query decomposition -> (3) Hypothesis generation -> (4) Soft filtering -> (5) Evidence scoring -> (6) Synthesis -> (7) Merging and ranking -> (8) Answer and confidence

(1) Question Analysis:

In the UIMA architecture, the collection processing engine consists of the collection reader, analysis engine and common analysis structure. Collection level processing contains the entity registrar with event, entity and relation coreferencers, ultimately creating a semantic search index, the feature structure or common analysis structure store in XML and extracted knowledge database. The UIMA analysis engine consists of programs that analyze documents and infer information about them. The extracted knowledgebase resides in an IBM DB2 database.

Data in the common analysis structure can only be retrieved using indexes. Indexes are analogous to the indexes that are specified on tables of a database, and are used to retrieve instances of type and subtypes. In addition to a base common analysis structure index, there are additional indexes for annotated views, created by natural language processing techniques such as tokenization and named entity recognition.

In the Jeopardy game show, contestants are presented with clues in the form of answers, and must phrase their responses in question form. Watson receives questions or "clues" textually and then breaks them down into subclues. Question clues often consist of relations, such as syntactic subject-verb-object predicates and semantic relationships between subclues such as entities. A semantic search is where the intent of the query is specified using one or more entity or relation specifiers. Triplestore queries in the primary search are based on named entities in the clue. Watson can use detected relations to query a triplestore and directly generate candidate answers. Triplestore sources in Watson include dbpedia.org, wordnet.princeton.edu and YAGO (which itself is a combination of dbpedia, WordNet and geonames.org). Triplestore and reverse dictionary lookup can produce candidate answers directly as search results. Reverse dictionary lookup is where you look up a word by its meaning, rather than vice versa.

(2) Query decomposition:

DeepQA supports nested decomposition, or query decomposition, a kind of stochastic programming, where questions are broken down into more easily answered subclues. Nesting means that an inner subclue is nested in the outer clue, so the subclue can be replaced with an answer to form a new question that can be answered more easily.

(3) Hypothesis generation:

In constructing hypotheses, Watson creates candidate answers and intermediate hypotheses, and then checks hypotheses against WordNet for "evidence", dealing with hundreds of thousands of evidence pairs. Watson uses the offline version of WordNet, a lexical database that groups English words into synsets, or sets of synonyms, that provide definitions and record semantic relationships. Chris Welty, David Gondek, JW (Bill) Murdock and Chang Wang are the IBM Watson Algorithms Team machine learning experts. Wang in particular is an expert in "Manifold Alignment". In engineering, manifolds typically bring one into many or many into one. According to Wang, "Manifold alignment builds connections between two or more disparate data sets by aligning their underlying manifolds and provides knowledge transfer across the data sets". Watson uses logical form alignment to score on grammatical relationships, deep semantic relationships or both. Inverse document frequency is used as a statistical measure of word importance. And, the Smith-Waterman algorithm compares sequencing between questions and candidate answers for evidence.

(4) Soft filtering:

Soft filtering may consist of a lightweight scorer computing the likelihood of a candidate answer simply being an instance of the lexical answer type, or LAT. A LAT is a word in the clue that categorizes the type of answer required, independent of assigned semantics. Watson uses lexical answer type for deferred type evaluation. Interestingly, Ferrucci's name is on an IBM patent (System And Method For Providing Question And Answers With Deferred Type Evaluation), which includes lexical answer type. The patent method includes processing a query including waiting until a descriptor (Type) is determined and a candidate answer is provided. Then, a search is conducted to look for evidence that the candidate answer has the required lexical answer type. Or, it may attempt to match the LAT to a known ontological type (OT). The evidence from the different ways to determine that the candidate answer has the expected lexical answer type (LAT) is combined and one or more answers are delivered to a user. The IBM Watson team found 2500 distinct and explicit LATs in the 20,000 Jeopardy Challenge question sample; the most frequent 200 explicit LATs covered less than 50 percent of those.

(5) Evidence scoring:

There are two layers of machine learning on top of the many NLP processes. Learners located at the bottom layer are called base learners, and their predictions are combined by metalearners in the upper layer. On top of the first learning layer is a reasoning layer, which includes temporal reasoning, statistical paraphrasing, and geospatial reasoning, in order to gather and weigh evidence over both the unstructured and structured content to determine an answer with the most confidence. Watson uses about 100 algorithms for rating each of up to some 10,000 sets of possible answers for every question. Trained classifiers score each of the hundreds of NLP processes.

One type of scorer uses knowledge in triplestores for simple reasoning, such as subsumption and disjointness in type taxonomies, geospatial and temporal reasoning. Temporal reasoning is used in Watson to detect inconsistencies between dates in the clue and those associated with a candidate answer. Paraphrasing is the expression of the same message in different words. Statistical paraphrasing is the use of a statistical sentence generation technique that recombines words probabilistically to create new sentences. Geospatial reasoning is used in Watson to detect the presence or absence of spatial relations, such as directionality, borders and containment between geoentities.

(6) Synthesis:

Each subclue of every nested decomposable question is processed by a dedicated QA subsystem, in a parallel process. DeepQA then synthesizes final answers using a custom answer combination component. This custom synthesis component allows special synthesis algorithms to be easily plugged into the common framework.

Aditya Kalyanpur, Siddarth Patwardhan and James Fan are the IBM Watson Algorithms Team reasoning experts. In their 2010 paper, titled "PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource", Kalyanpur, Fan and Ferrucci present a system for the statistical aggregation of syntactic frames. A syntactic frame is the position in which a word occurs relative to other classes of words, such as subject, verb, and object. In contrast, a semantic frame can be thought of as a concept with a script used to describe an object, state or event.

(7) Merging and ranking:

Watson uses hierarchical machine learning, a learning methodology inspired by human intelligence, to combine and weigh evidence in order to compute the confidence score, and through training it learns to be predictive. Watson merges answer scores prior to ranking and probabilistic confidence estimation, using a variety of matching, normalization, and coreference resolution algorithms. In this second level of machine learning, metalearner classification systems take classifiers and turn them into more powerful learners, using multiple trained models. Final ranking and merging evaluates hundreds of hypotheses based on hundreds of thousands of scores to identify the best one based on the likelihood it is correct.

(8) Answer and confidence:

After being trained on more or less the entire history of the Jeopardy game, the second level of machine learning kicks in to rank the merged scores using one or more metalearners that have learned to evaluate the results of the first level classifiers. The metalearner combines these predictions by multiplying the probabilities by weights assigned to each base learner and taking the average, and learning how to stack and combine the scores. The ultimate answer results from this statistical confidence.

So, how many PlayStations (PS3) would it take to make an IBM Watson? By my calculation, 320. AFRL Condor Cluster took about 2,000 PS3 to make and does some 500 teraFLOPS. IBM Watson does 80 teraFLOPS. [500/80=6.25 & 2000/6.25=320] The cost of 320 PlayStations would be about $128,000, or half the retail price for one IBM Power 750 32 core cluster at around $350,000. (In comparison, as of 2007 PCWorld put IBM's Blue Gene/P system cost at $1.3M per rack, and the Blue Gene/L at $800K.) Deep Blue was a $100 million project. I'm estimating the cost of IBM Watson at up to $50 million, including at least $18 million labor and potentially up to $31.5 million in material costs. It should be noted that "Jeopardy! And IBM Announce Charities To Benefit From Watson Competition".

The Quora IBM Watson topic is a good place for questions about Watson. To learn more about what makes IBM Watson tick, I suggest watching the IBM video "Building Watson - A Brief Overview of the DeepQA Project" and reading the paper "Building Watson: An Overview of the DeepQA Project". Look for the 'updateable' ebook, "Final Jeopardy", by Stephen Baker. IBM operates an informative web site about their project at ibmwatson.com.

Additional sources:

Ante, Spencer E. "IBM Computer Beats 'Jeopardy!' Champs - WSJ.com." Business News & Financial News - The Wall Street Journal - WSJ.com. Web. 14 Jan. 2011.

Chambers, Mike. "Avatar for Watson Supercomputer on Jeopardy Created with Flash (Adobe Flash Platform Blog)." Adobe Blogs. 13 Jan. 2011.

The Associated Press. "Computer Could Make 2 'Jeopardy!' Champs Deep Blue." Google. 13 Jan. 2011. Web.

Gondek, David. "How Watson “sees,” “hears,” and “speaks” to Play Jeopardy!" IBM Research. 10 Jan. 2011. Web.

Gustin, Sam. "IBM’s Watson Supercomputer Wins Practice Jeopardy Round | Epicenter | Wired.com." Wired.com. 13 Jan. 2011.

McNelly, Rob. "Watson Follows in Deep Blue's Steps." AIXchange. 21 Dec. 2010. Web.

Miller, Paul. "IBM Demonstrates Watson Supercomputer in Jeopardy Practice Match." Engadget. 13 Jan. 2011. Web.

Morgan, Timothy P. "Power 750: Big Bang for Fewer Bucks Compared to Predecessors." Welcome to IT Jungle. 16 Aug. 2010. Web.

NOVA. "Will Watson Win on Jeopardy!?" WGBH. 20 Jan. 2011.

Rhinehart, Craig. "10 Things You Need to Know About the Technology Behind Watson." Craig Rhinehart's ECM Insights. 17 Jan. 2011.

Thompson, Clive. "What Is I.B.M.’s Watson?" NYTimes.com. 16 June 2010.

Wallace, Stein W., and W. T. Ziemba. "Applications of Stochastic Programming." Philadelphia, PA: Society for Industrial and Applied Mathematics, 2005.

Will, Steve. "You and I: IBM Watson’s Storage Requirements." IDevelop. 11 Jan. 2011. Web.

= = =

Appendix 1: Chronological bibliography of David Angelo Ferrucci (David A. Ferrucci, David Ferrucci, D.A. Ferrucci, D. Ferrucci):


Fan J, Ferrucci D, Gondek D, Kalyanpur A. PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource. In: First International Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR).; 2010:122.

Ferrucci D. Build Watson: an overview of DeepQA for the Jeopardy! challenge. In: Proceedings of the 19th international conference on Parallel architectures and compilation techniques.; 2010:1-2.

Ferrucci D, Brown E, Chu-Carroll J, et al., others. Building Watson: An Overview of the DeepQA Project. AI Magazine. 2010;31(3):59.

Ferrucci D, Lally A. Building an example application with the unstructured information management architecture. IBM Systems Journal. 2010;43(3):455-475.


Ferrucci D, Lally A, Verspoor K, Nyberg A. Unstructured Information Management Architecture (UIMA) Version 1.0. Oasis Standard. 2009.

Ferrucci D, Nyberg E, Allan J, et al., others. Towards the Open Advancement of Question Answering Systems. IBM Research Report. RC24789 (W0904-093), IBM Research, New York. 2009.


Drissi Y, Boguraev B, Ferrucci D, Keyser P, Levas A. A Development Environment for Configurable Meta-Annotators in a Pipelined NLP Architecture. In: LREC.; 2008.

Fodor P, Lally A, Ferrucci D. The prolog interface to the unstructured information management architecture. Arxiv preprint arXiv:0809.0680. 2008.


Bringsjord S, Ferrucci D. BRUTUS and the Narrational Case Against Church’s Thesis (Extended ABstract). 2007.


Chu-Carroll J, Prager J, Czuba K, Ferrucci D, Duboue P. Semantic search via XML fragments: a high-precision approach to IR. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval.; 2006:445-452.

Ferrucci D, Grossman RL, Levas A. PMML and UIMA based frameworks for deploying analytic applications and services. In: Proceedings of the 4th international workshop on Data mining standards, services and platforms.; 2006:14-26.

Ferrucci D, Lally A, Gruhl D, et al., others. Towards an interoperability standard for text and multi-modal analytics. IBM Res. Rep. 2006.

Ferrucci D, Murdock JW, Welty C. Overview of Component Services for Knowledge Integration in UIMA (aka SUKI). IBM Research Report RC24074. 2006.

Ferrucci DA. Putting the Semantics in the Semantic Web: An overview of UIMA and its role in Accelerating the Semantic Revolution. In: ; 2006.

Murdock J, McGuinness D, Silva P da, Welty C, Ferrucci D. Explaining conclusions from diverse knowledge sources. The Semantic Web-ISWC 2006. 2006:861-872.


Fikes R, Ferrucci D, Thurman D. Knowledge associates for novel intelligence (kani). In: 2005 International Conference on Intelligence Analysis.; 2005.

Levas A, Brown E, Murdock JW, Ferrucci D. The Semantic Analysis Workbench (SAW): Towards a framework for knowledge gathering and synthesis. In: Proc. Int’l Conf. in Intelligence Analysis.; 2005.

Mcguinness DL, Pinheiro P, William SJ, Ferrucci MD. Exposing Extracted Knowledge Supporting Answers. Stanford Knowledge Systems Laboratory Technical 12. 2005.

Murdock JW, Silva PPD, Ferrucci D, Welty C, Mcguinness D. Encoding Extraction as Inferences. In: Stanford University. AAAI Press; 2005:92-97.

Welty C, Murdock JW, Da Silva PP, et al. Tracking information extraction from intelligence documents. In: Proceedings of the 2005 International Conference on Intelligence Analysis (IA 2005).; 2005.


Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering. 2004;10(3-4):327-348.

Nyberg E, Burger JD, Mardis S, Ferrucci D. Software Architectures for Advanced Question Answering. New Directions in Question Answering. 2004.

Nyberg E, Burger JD, Mardis S, Ferrucci DA. Software Architectures for Advanced QA. In: New Directions in Question Answering.; 2004:19-30.


Chu-Carroll J, Ferrucci D, Prager J, Welty C. Hybridization in question answering systems. In: Working Notes of the AAAI Spring Symposium on New Directions in Question Answering.; 2003:116-121.

Chu-Carroll J, Prager J, Welty C, et al. A multi-strategy and multi-source approach to question answering. NIST SPECIAL PUBLICATION SP. 2003:281-288.

Ferrucci D, Lally A. Accelerating corporate research in the development, application and deployment of human language technologies. In: Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems-Volume 8.; 2003:67-74.


Welty CA, Ferrucci DA. A formal ontology for re-use of software architecture documents. In: Automated Software Engineering, 1999. 14th IEEE International Conference on.; 2002:259-262.


Bringsjord S, Ferrucci D. Artificial Intelligence and Literary Creativity: Inside the Mind of Brutus, A Storytelling Machine. Lawrence Erlbaum; 1999.

Welty CA, Ferrucci DA. Instances and classes in software engineering. intelligence. 1999;10(2):24-28.

= = =

[APPLICATION] Method For Processing Natural Language Questions And Apparatus Thereof

[APPLICATION] System And Method For Providing Question And Answers With Deferred Type Evaluation

[APPLICATION] System and method for providing answers to questions
US Pat. 12152411 - Filed May 14, 2008 - International business machines corporation

[APPLICATION] Method and system for characterizing unknown annotator and its type system with respect to reference annotation types and associated reference taxonomy nodes
US Pat. 11620189 - Filed Jan 5, 2007

Method and system for characterizing unknown annotator and its type system with respect to reference annotation types and associated reference taxonomy nodes
US Pat. 7757163 - Filed Jan 5, 2007 - International Business Machines Corporation.

[APPLICATION] Method And Apparatus For Managing Instant Messaging
US Pat. 11459694 - Filed Jul 25, 2006

[APPLICATION] Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora
US Pat. 11332292 - Filed Jan 17, 2006 - International Business Machines Corporation

Question answering system, data search method, and computer program
US Pat. 7844598 - Filed Sep 22, 2005 - Fuji Xerox Co., Ltd.

[APPLICATION] System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system
US Pat. 10449264 - Filed May 30, 2003 - International Business Machines Corporation

[APPLICATION] System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching
US Pat. 10449398 - Filed May 30, 2003 - International Business Machines Corporation

[APPLICATION] System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
US Pat. 10449409 - Filed May 30, 2003 - International Business Machines Corporation

System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
US Pat. 7139752 - Filed May 30, 2003 - International Business Machines Corporation

[APPLICATION] System, Method and Computer Program Product for Performing Unstructured Information Management and Automatic Text Analysis
US Pat. 10448859 - Filed May 30, 2003 - International Business Machines Corporation

Method and system for loose coupling of document and domain knowledge in interactive document configuration
US Pat. 7131057 - Filed Feb 4, 2000 - International Business Machines Corporation

Method and system for document component importation and reconciliation
US Pat. 7178105 - Filed Feb 4, 2000 - International Business Machines Corporation

Method and system for automatic computation creativity and specifically for story generation
US Pat. 7333967 - Filed Dec 23, 1999 - International Business Machines Corporation