The Web is probably the richest information repository in human history, but most of its information is passive and unstructured. The Web doesn't know what it carries and for what purpose, and the users cannot specify what they want from it. There are some sites that use structured information storage and queries, but they are just little islands of order in the chaotic sea of information, not communicating to each other.
Since 1995, there started appearing various proposals for meta-data representation and communication standards, and other services and tools that may eventually merge into the global Semantic Web. Hopefully, in the next few years we will see universal adoption of open standards for representation and sharing of meta-information.
The Web should be aware of the content and purpose of its documents and links, and interests of its users, and make the best use of all encoded knowledge. Open semantic standards and communication protocols will allow creation of various services for knowledge gathering, storage, and distribution, as well as user-friendly client-side utilities that would communicate with these services and provide intelligent content selection, processing, and representation functions.
I expect that the next generation of information services will do for Web semantics what HTML and HTTP have done for its communication layer, that is to build a foundation for a global, intelligent, reactive knowledge exchange system.
In this paper, I will attempt to review existing efforts and proposals, suggest some additional areas for development, discuss perspectives of distributed knowledge-processing systems, and explore technological and organizational efforts necessary for a coordinated implementation of this vision.
It may not be necessary to formulate these ideas as a proposal. I am quite sure that a system like this is as invevitable a step in the development of a civilization as creation of communication infrastructure. Most of the needed technologies are already in various stages of development, and a realization of a system similar to one suggested here seems just a matter of time, whether I or anybody else comes up with such a proposal or not. However, implementation of technological inevitabilities needs visionary facilitation. Without a good vision, system developers tend to make stupid and costly mistakes, such as DOS 640K memory limit, year 2000 problem, or the campaign against "the commercialization of the Net". With limited interest scope, the participants either attempt to take proprietory control over personal data and encoding standards, or steer the development in the direction of their particular interests. In the result, the real process - the development of the global body of knowledge - gets distorted and slows down.
A description of a person may include: name, address, gender, e-mail, home page, date of birth. Open Profile Standard was just introduced for this purpose. Another example of an existing personal profile standard is the Geek code. The Geek Code probably would have been a lot more successful if somebody developed an interface for translating between this code and English, and some software for matching people based on this code.
A description of an event may include: type of an event (concert, conference, etc.), start and end times, location, price, attendance conditions. See, for example, the event entry form from Events directory.
A description of an item for sale may include: item description, owner, price, offer expiration date.
Practically, at this point, there is a huge number of description forms for all kinds of items, with some meta-services, such as Submit-It for Web pages, providing translations to them.
Some suggestions on link types are available from W3C.
Relations between people and companies may be: employee, owner, member, founder, supporter, etc. - with descriptions of roles, periods of involvement, etc.
Relations between people may be: friend, spouse, employer, teacher, client,... Six Degrees website attempts to collect these relations in a proprietory database.
<DIV CLASS=PREFACE> [preface content] </DIV> on the paragraph level: <ADDRESS> Alexander Chislenko, 6 McLean Pl. #5 Cambridge MA 02140-2437 USA </ADDRESS> And on the word level: <DATE>02Dec1959</DATE>Still, these are just hints at possible syntax; there are only a few tags suggested, there is no way to tell how to parse the address or what the date may refer to.
Further work in this direction should probably include organizational efforts to ensure compatibility of multiple ontologies, and "ontology negotiation" services between different agencies and integration of item ontologies with general commonsense concept ontologies of networked AI systems.
In all cases, these requests should be described in formalized formats, together with specifications of desired formats of expected data and locations of agents that should receive it.
A notable effort in this direction is represented by RDM (Resource Description Messages) - a mechanism to discover and retrieve metadata about network-accessible resources. It is based on Harvest's Summary Object Interchange Format (SOIF) - a syntax for transmitting resource descriptions and other kinds of structured objects.
The services here may include:
The knowledge that a common-sense system may derive from an open semantic dataset may include statements like:
Eventually, we may see a wide variety of networked knowledge processing servers collecting and generalizing data in their own areas and cooperating with each other for "interdisciplinary" problem solving, first with direct human involvement, and then incresingly on their own.
Some suggestions on how to implement a distributed reasoning system using multiple specialized copies of CYC and KQML for communication between them, are available at The Cycic Friends Network
Another interesting solution is represented by Agent Communication Language (ACL) developed within the ARPA Knowledge Sharing Effort. ACL has a vocabulary, an "inner language" called KIF (Knowledge Interchange Format), and an "outer" language called KQML (Knowledge Query and Manipulation Language). An ACL message is a KQML expression in which the arguments are terms or sentences in KIF formed from words in the ACL vocabulary.