“ It may not be possible to
organize the whole web... but it may be possible to develop an
organizing map of the web”
WEB IS A FOREST ...
SEMANTIC WEB A JAPANESE GARDEN ?
Musings of a student of Ranganathan
F.J. DEVADASON
devadason_f_j@yahoo.com
TO THE
MEMORY OF TWO OF MY TEACHERS OF LIBRARY AND INFORMATION SCIENCE
|
|
PROF. M.R.
KUMBHAR
|
PROF. D.B. KRISHNA RAO
|
(Both were quite firm and strict, yet kind. The
former accepted me as a research student against odds, encouraged me and
supervised my research at Karnatak University, Dharwad. But for which I
would have never been an academic. The latter instilled in me an interest in
Colon Classification at the University of Madras, he being the only
Ph.D under the supervision of Dr. S.R. Ranganathan).
IS THE PRESENT WEB NON-SEMANTIC ?
This is a question that came to my mind when
we are being bombarded with the concept of Semantic web and
its glories lifted up to the heavens. While discussing the importance
of correct naming of concepts and using correct and meaningful
terms in designing a faceted classification, Dr.
Ranganathan mentioned in a class, that one method is to
verify the meaning of the antonym and check the appropriateness /
truthfulness. The truth is that the present web is as
semantic (meaningful) as any other, otherwise no one can make any
sense out of it and you know what would have been its fate.
It is not non-semantic and that one has to create a semantic web to
make it meaningful. True, the digital documents (objects)
in it (text, image, animation, video, audio, or any combination
of these) are not well structured like the documents (records) in a
database. So these may not be processable easily as records in a
database. But at the same time, such well structured databases
processable by specific software to generate the required answers to
queries are also embedded in the web as unique objects. The
PRESENT WEB IS SEMANTIC and meaningful for the
millions who access it and use it and add to it.
PRESENT
WEB IS ALMOST LIKE A FOREST
Due to the fact that any one can add to the web, it
has grown enormously, uncontrollably, and it is a peculiar man-made forest
or jungle. It is unorganized, most of the information is ill structured.
It is ill structured because it was not designed as a proper global cooperative
information system in the first place. True to the nature of a
forest, it has dangerous animals also living in there lurking secretly to
devour the unsuspecting victim. Will any one attempt to organize a forest
? Trim it down and make Japanese Garden out of it, and call it "Semantic Garden", "Semantic Forest"?
But now there are
such attempts called "Semantic Web" attempts. . It is not possible because web is growing every second and chaotically
too.
Freedom of expression and the democratic movement
has taken root in every educated individual irrespective of the ideology followed in individual
societies. If it is forced, then the Japanese Garden web type documents will
equal the number of such gardens existing in the real world,
for the rest would turn to blogs, myspace, e-mail to
web document and such easy web publishing, leaving the Japanese garden
type web documents to the "elite" and the rich. The web will
be as chaotic as ever and special search engines such as blog search
engines, email search engines and rss feed search engines etc., would
become the order of the day.
Semantic Webbers would like to keep the forest, but make it a
processable one !. They
would like the unstructured
or ill structured documents to be well structured. Is that achievable? OK you use
RDF, XML, Ontologies (It is a faceted classification scheme). . At present it has been proposed that the ontologies must
be specific domain based. This is actually again like Librarians' method of
managing the information.
Librarians
knew that it would be difficult to manage one library collecting and organizing
everything and made subject specific Libraries and Information Systems. You have
the National Library of Medicine, the National Agricultural
Library and so on. Now the semantic
Webbers want to follow this model and suggest that ontologies must be domain
specific !!
Alright, you have domain specific ontologies/ faceted classification schemes. All
the web documents on the web are put as RDF using OWL, ontologies
and all that stuff. That is, you have all the metadata and all the
agents. . So what? They are processable, you can find out which medical
doctor is available at a time suitable to you and all that stuff, PROVIDED
YOU IDENTIFY FIRST THE WEB DOCUMENT HAVING SUCH TYPE OF DATA. . But how are you
going to identify which document or web documents have to be accessed to
further process by your agent. Is that also going to be identified by your
agent? Are you going to have agent of agents to select the appropriate agent? Is
your agent going to traverse the web to find out the documents to be processed?
Is it that for a query the super agent to which the query is
submitted selects the specific special agent / agents. The specific query using
ontologies is processed and then the agent enters the web. But where? Into the
forest and starts processing all the processable? You are going to have trouble
as you may have to crawl at least a good part of the web, and follow the links
provided in the web documents identified, to get the right data to process.
.What to do then? You have to
have an index to the web which you can search first then select the best fit
from it, identify whether the required type of data is available in it and
then process it.
MAP
FOR THE WEB FOREST
When you wish to traverse a desert
you perhaps cannot have a map and rely on it. But you may be able
to use a compass. However the early explorers did a great job
of developing maps for the routes they undertook. Not only they developed
maps of the routes, they also kept diaries of the customs of
the places they visited, the dangers, the wise things to do and so on.
They had maritime maps and complicated codes using stars and planets
and so on.
Just like the early explorers set out to explore the
earth and prepared maps to indicate what is where, what precautions one must
take to be there etc., a map of the web must be constructed.
The picture on the right hand side is not the appropriate one. I
saw a cartoon perhaps drawn by R.K. Laxman after he visited Los Angeles
during the 1980's and got lost in the mesh of roads, while
he ventured to explore the terrain by walking. That would have
been the most appropriate. It was perhaps published in Span
or The Hindu I am not sure. If any one comes across a weary backpack
man on foot in front of a map of a flyover on the side of a flyover
looking at the sign "You are here" please let me know.
|
|
Now a days, even when we want
to visit a fairly medium sized and well laid out Garden or Park or even a
Graveyard (cemetery) we need a map which will indicate where you are and where
other things are. [I visited this Rabaul War Cemetery <http://www.roll-of-honour.org.uk/Cemeteries/Rabaul_War_Cemetery/
> in Papua New Guinea ]. Should we not have a map for the web ? Of
course we have to have the map at different levels. We
can even have the maps for specific domains maintained at different locations just like
the National System of Libraries with specific libraries assigned for specific
areas or subjects. We can create surrogate or Summary Record of the web document, like
the one suggested in the paper “Faceted Indexing Based System for Organizing and
Accessing internet Resources”. (The surrogate
must be enriched with more information as mentioned below). We cannot organize
the entire web but we can organize the surrogates/ summary records !. If we
have the surrogates as domain specific then it is good. The surrogate index will
not only have the summary of each of the sites indexed, but also information on
the data available, its structure, how to submit a query to the specific
database contained in the site etc.. Then the agent will have to select the
segment of the surrogate file and easily select the required site / database and
find out how to further process and get to the selected few sites to complete
the job. How to
further process the data contained in the site retrieved by searching surrogates is present
in the surrogate itself. We can have a master index and domain specific indexes
/ surrogates. Even if the data in the target site is presented in a table, it is
enough if the surrogate record for it has the structure information so that
appropriate routines for processing could be selected. The target site need not follow the
semantic web standards
. This
is the same idea of a catalog of media material. The surrogate record
will describe the media material, its characteristics such as
resolution, which equipment is required to play the media
material and so on.
The possible surrogate
systems are systems like Google, Yahoo, etc. These have already
built up the necessary tools. But what is lacking is the inclusion of
a facet analyzed subject heading with appropriate superordinates to
each of the components in the heading enriched with synonyms. The
required classification and indexing tool called Classaurus
can also be derived from this facet analyzed subject heading. The
facet analyzed subject heading / POPSI heading (for style sake
it can be referred to as Logico Semantic Domain Expression (LSDE) --
which is nothing but a structured subject heading), may have to be assigned
to each of the meaningful units / sections and subsections of the web
document as required.
If by developing
complicated standards, putting a page on the web becomes like touching the nose
by winding the hand around the back of the neck, then the freedom loving, line
of least action takers, would just put their web documents as blogs,
e-mail (HTML), myspace, discussion lists (HTML) and such other easy,
quick methods not caring for all the high flown standards of RDF, OWL, Ontology
(Faceted Classification scheme) etc.. Then the web would become more chaotic than it exists
today. However, any processable data / database in the individual
web document could be indicated in the surrogate with structure identification
, if necessary a sample of the data, perhaps one row of the data from the table
in the web document with datanames (meta data) could be put in the
surrogate for easy identification of processing routines. Better still, any
routine necessary for processing could be stored in the surrogate itself. But
for solutions that require data from different documents to be merged, some
generalized processing routines would be required
.
LET ME RECAPITULATE
1)Because the facet analyzed POPSI (POstulate based Permuted
Subject Index) headings can produce an organizing effect while sorted, each of
the web documents and their worthwhile sections and subsections, coherent
text, or image or audio or video or any combination of these, must be fitted
with such a heading, having, a Base or Discipline, its divisions and
subdivisions as applicable, the Main Object or Core Entity, its species, parts,
constituents, any Property of or Actions on the object or by the Object, and its
types followed by Common Modifiers, each one of them "modulated" by their
respective superordinates enriched with synonyms. If these are sorted then that
would constitute the World Wide Web Map (WWWM), exhibiting an organizing effect.
[Please see Bhattacharyya's POPSI : Its fundamentals and procedure based on a
general theory of subject indexing languages, in "Library Science with a slant
to Documentation, Vol 16 (1979) Iss (1); pp 1-34" which is not available on the
web. However you can access this <http://drtc.isibang.ac.in/~guha/popsi/popsi-doc.pdf
>, you may not have this "International Federation for Documentation /
Classification Research Report No 21: Computerized Deep Structure
Indexing System, Indeks Verlaag, Frankfurt 1986" but you can have a look at
"Online Construction of Alphabetic Classaurus" </devadason.geo/OnlineClassaurus.htm
>, and of course http://us.share.geocities.com/devadason.geo/DSIS.pdf
The most
important classic document from which all the Facet Analysis
stalwarts -- those who have developed "facet analysis for dummies",
"easy facet analysis", the "true and simple facet analysis" and brought
fame to them -- have copied is "Prolegomena to Library Classification by S.R. Ranganathan" and is available at <http://dlist.sir.arizona.edu/1151/> This is edition 3 1967. Some how I have a fascination for edition 2, 1957.
2) Such
headings (Logico Semantic Domain Expressions - LSDE's) could be formed easily by
the web document builders as it is almost similar to forming an expressive
title to the web document, which can be done by answering a set of
questions and following a few guidelines. Or it can be done by the
surrogate creators / search systems. Initial efforts would become less
time consuming as the web maps get built and are made available for
reference while creating the subject headings.
3) These
subject headings could be used to form the Classaurus (faceted classification
scheme with vocabulary control features) and be used for easy translation of
faceted headings from one language to the other (there are some problems,
due to certain concepts not existing in certain languages and terms to denote
them being not available and so on), and to categorize the web
documents.
4)
It would be worthwhile exploring the possibility of utilizing the
existing library system such as the different National Libraries, having the necessary expertise in particular fields, to
create and maintain specific WWW Maps for the areas of expertise of
individual national libraries. For instance, the National
Agricultural Library could be used as the agency responsible for
mapping any web document to be categorized as belonging to Agriculture.
The agency has a well developed thesaurus and would be the most
appropriate agency for developing the Agriculture WWWM. This agency
could be assigned a range of IP addresses (covering a specific
geographical area or so), to monitor and update the web. In a
similar way the existing information handling expertise could be
channelized to form National WWW Maps in different subject areas and
merged. It could be formed language specific, but there should be an
English language LSDE for every subject heading in the summary
record / surrogate. Switching between languages may be possible, but
there are problems when such modulated subject headings are translated
from one language to another. Even hierarchy would be a bit difficult
to map correctly !. I do not want to go into examples here. The
allocation of work could be on the style of cooperative global
information system, and it could be subject - specific, language -
specific, nationality - specific with full coverage of
designated IP address ranges of the web
documents assigned to individual agencies to build these maps and
maintain them avoiding any duplication of effort.
5) Any speific
data or database included in the web document could be indicated in the
surrogate for that web document, with a model of it or structure of it, or even
an example of it consisting of one row of the data with attached datanames /
meta data. This will help prepare the processing routines for further processing
of the data available in the web document. It would be even better
to store the processing agent in the surrogate itself or provide a
link to it, so that it can be loaded for processing.
E-mail: devadason_f_j@yahoo.com
21 May, 2007.
Return to My Old Home Page GeoCities, Athens.