Newsgroups: comp.databases,comp.lsi,comp.parallel.pvm,comp.parallel.mpi,comp.org.acm,comp.org.ieee,comp.protocols.misc,comp.realtime,comp.arch,comp.software-eng,comp.sys.super,comp.theory,comp.dsp,sci.math
From: Vincent Johns <vjohns@pop1.backbone.ou.edu>
Subject: Re: Publishing Scholarly Work on the Web -- opinions?
Organization: Telepath
Date: Wed, 25 Sep 1996 11:39:09 -0500
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <3249602D.7D44@pop1.backbone.ou.edu>

(posted & emailed)

robert bristow-johnson <robert@audioheads.com> wrote:
> 
> 
> In article <Dy39J2.3Fw@undergrad.math.uwaterloo.ca>, Erik Demaine (eddemain@neumann.uwaterloo.ca) writes:
> >robert bristow-johnson (robert@audioheads.com) wrote: [trimmed]
> >: In article <3236515D.5B58@cerfnet.com>, N. Gat <oksi@cerfnet.com> wrote:
> >: >                         A FEW TECHNICALITIES:
> >: >
> >: > (i) The entire paper must be submitted in HTML, and graphs, figures and
> >: > charts in gif or jpg format  (this is a deviation from common formats
> ...
> >:
> >: aren't there software tools that can convert PostScript into HTML?
> >: if not, someone could make some dough creating such a tool.
> >
> >Actually, that's pretty much impossible.
> 
> conceptually, i don't see why.  admittedly, i'm not familiar
> with the details of either HTML or PostScript, but i have
> looked at files of both and it looks like it's, among other stuff,
> some collection of graphical objects and text objects which
> should be able to be parsed and translated.  HTML has some
> other weird objects like those little underlined words that
> link you to other web sites so i can see why an HTML -> PS
> converter would be a problem.  but i don't see why a
> PS -> HTML converter would be.  it would just be kinda "dead"
> or non-interactive HTML.

There are a couple of ways to convert.  The simple way, but not
easily searchable, is to use GhostScript or something similar to
put the PS document into GIF or TIFF or whatever, and link the images
via HTML.   Way 2 is to use an OCR program (I use Cuneiform OCR, sold
by Cognitive Technology Corporation, at ctc@well.com) to extract text.
Problem with PS is that it may contain text, but maybe as ASCII, maybe
as ASCII hexadecimal numbers, maybe in a weird order on the page.  
After converting via GhostScript, you can see where the text is and
convert it in a reasonable sequence.  The pictures can probably just
be cropped and left as images that the HTML links to.  Acrobat 
documents look nice, but I'm not sure that the text can easily be
accessed as text, and one would need to do that.  (Someone please
correct me if I'm mistaken about Adobe Acrobat.)


> >I have another technicality question that is very important: where are you
> >going to get the diskspace?
> 
> i dunno.  where does DejaNews get all of it?  they seem to have
> _all_ usenet postings for the last 2 years accessable.

I love DejaNews!  Anyone interested can find it at
http://www.dejanews.com/ .
 

> one possible answer would be maybe the institution/company that
> the author was/is affiliated with when he "publishes" the
> paper would be where the paper is actually physically located
> on disk.  maybe N. Gat's ScienceExpo (or some other
> organization like IEEE or the "cyber-journal" the paper is
> "published" in) would just have the organization, titles,
> abstracts, and destination links to the papers.

Links don't take much space, but you'd want the actual papers to
be located somewhere that's not volatile, that could be expected
to maintain them without change of address for many months or 
years.  It may not be possible to guarantee that kind of permanence 
just yet.
 

> >  It seems to me that there're going to be a huge
> >number of potentially la rge (in terms of diskspace) papers.
> 
> how much physical space do they take up now?  how much effort
> and resources are spent xeroxing, microfilming, and mailing
> copies for examination?  this (online publishing) would be
> cheaper, overall, i think.

There's alreay a limit on the size of published papers.  Journals
limit what they publish; even theses and dissertations have to fit 
into one volume (though the restrictions are looser there).

> >  I think this would be very bad.  This means
> >you going to need some continual funding to keep getting more disks.
> 
> how much funding do university (and corporate) libraries put
> into maintaining a decently current library of the state of
> the art _and_ their own dissertations and publications?

Scholarly stuff never expires.  Even total garbage is of potential
value in research (e.g., what would inspire a person to publish such
stuff?), and some silly-looking ideas sometimes later turn out to 
be prescient.  (I admit that most don't, but how can you know?)
 

                           -- Vincent Johns


