 |
| Lesson 1: Basics / What is CGI? |  |
 |
 |
Client Side - Server Side
Always if functions exceeding the HTML standard are needed an extra program capacity are be needed.
In doing so first you'll have to differ where the program will be executed.
If the program's running on the client side (at the user's
computer) so you'll have to insert the analogical source code into the HTML document.
Examples therefor are Javascript,
Java-Applets and ActiveX components. But because of the safty aspects
the possibilities of this programs are confined a lot. They will be
executed in an absolute secured ambit (for example: Java-Standbox) and
they aren't allowed to run a system function like installing or
deleting files. The only informations the imbedded programs are
allowed to access are the elements of the (embedding) HTML documents.
Example: A shopping system based on Java-Applets has to transfer the
whole inventory with the external HTML document. Furthermore an
imbedded program has to be presented in a cross platform form to
limit the use to only few operating systems (and/or CPUs).
In this aspect only Java and Javascript remain:
In the case of Javascript the source code of the program will be
transfered in Java it's a portierable pseudo-code(byte-code). In
either case a further program at client side has first to interprete
the code and then to run it.
It will only be possible to work in the wanted degree if the
interpreters of divers user's deliver equal (or at least alike) result. Unfortunately now it isn't so.
Anyway client programs are always very attractive because you can better shortchange the resources in net.
Progeramms running on a server have no such limit. Here the security
question is asked different. A course belov gives some answears to the relevant
questions. According to it server programs can access (within a specific limit) the manyfold system resources:
- Database
- Data
- Communication agent (like E-Mail-shipping)
- Further programs
Furthermore the program can have any form the server can run. So
there is no need to make the code portierable but the programmer can
completely exhaust the computer. The programming language which the the
programs were written in doesn't figure.
The reason why literature often tells about CGI scripts is only that
the most CGI programs were written in a language like Perl. The kind
of languages analysing, interpreting and runing source code only at
runtime is called script landueges.
Server programs get problems always when many inquiries have to be
responded because (mostly) there is only one computer whose capacity
has to be splitet up between the inquiries. Furthermore the result of
any program has to re returned so there'll probably be an extensive
net load..
Here some features, pros and cons of client and server programs are listed:
Client programs:
Pros:
The program can use the resources of the client computer so the server
will be discharged. It might be an edge with reckon intensive
programs. Furthermore the graphic and sound functions of the visitor's
computer can be better shortchanged. To interact with user no
further transfer is needed (Example: Client can be directly reminded of
faulty insertions and correct them)
Cons:
All needed informationes (inclusive the program) have to be transfered
first. The program isn't allowed to save data so it has no
ongoing storage it means it has no "memories".
Database query are always linked with transfers from server. The
interpreters of the progranns are in parts incompatible.
Application area:
Client programs are now predominantly used for graphical design
(OnMouseOver...) sometimes even forms are preedited. But it's
possible (and desireable) to create really functional GUIs (graphical
user interface) using them you can access server programs.
(Unfortunately Java programming isn't as easy as HTML design)
Server programs:
Pros:
The program can access databases, read and write data. It's running in
a defined environment and the client environments are irrelevant. It
delivers the result to all clients who can use HTML from host system
across all PC's to the point of WAP mobile phones. The resources of the
host computer can be used efficiently.
Cons: The resources of the host
computer are limited (even if it is a high-capacity system) that's why
it would come to a performance collapse if there were many enquiries at
the same time. The dialogue allways requires transfer of informations
charging the net (mainly if the answears have to be in "good graphic").
Application area: All programs that need to access databases or any other central datasets. Was ist CGI?
CGI stands for Common Gateway Interfaceand
defines a standard of information transfer between a request and
an application executing the request whereat the both sides of the
http protocol (http=hypertext transfer protocol) help oneself.
Usually the information stream proves to be like this: Using a http browser (Netscape Navgator, Microsoft Internet
Explorer) a client sends a request in form of a name of a file to the http server.
Mainly it's a HTML side that the http server can send to the client directly.
But
by use of some accolades of the request http server catchs that the
request should be relaied to an other program. Usually this accolades
can be directly catched in the addressing of the request (that means in
the filename or the complete path to the filename):
Components will be extracted from an address like http://www.tdb-engine.de/demos/hello.pl apart:
http the protocol (and so the analogical server) //www.tdb-engine.de the network server belonging to this domain (IP address) /demos/ the virtual path to the wanted file on the server hello.pl the wanted file
The http server now affects the class of the program
a) using the extension of the wanted program's name
Examples:
| .pl |
Perl programs |
| .prg |
tdbengine programs |
| .cgi |
Unix-Shell scripts |
b) using the directory the wanted file is in
Examples:
| /cgi-bin/ |
Standard CGI directory on Unix/Linux computers |
| /scripts/ |
Standard CGI directory on Windows NT |
| /cgi-tdb/ |
favoured directory for tdbengine programs |
Which of this both external program calls the computer will catch
and which not is defined in the analogical configuration of the server.
Static
and dynamic http requests the server can reply immediately by
transfering the requesed files to the client without accessing
other program resources are called static requests..
Predominantly that are
HTML documents .htm, .html
Text documents .txt
Pictures .gif, .jpg, .jepg
Sounds .mp3, .wav, .au
Requests can be only replied using other programs are called dynamic requests. That are amongst others
CGI programs (see above)
ISAPI applications .isa (aren't discussed here)
NSAPI applications .nsa (aren't discussed here)
There is a hybrid on all prevalent http servers in which dynamic
topics are embedded in static HTML documents. This embeddings are
so-called "server
side includes", you can detect external HTML ducuments by their name
extensions like .shtm or .shtml. Server side includes aren't
the theme of this basic course. The data can be transfered like this
In either case first the
http server receives orders. If the server detects a call of a CGI
program it will create an environment. It's made up of
- a set of environment variables
- a set of in- and output channels
Environment variables are nothing but callable strings with fixed
names. Even shell (command prozessor) has a set of environment
variables you can see using the "set" command. Thereto switch to a
terminal (MS-DOS prompt on Windows) and enter following:
set [RETURN]
You'll get a list of all environment variables of youre shell (here an example):
TMP=C:\WINDOWS\TEMP
TEMP=C:\WINDOWS\TEMP
PROMPT=$p$g
winbootdir=C:\WINDOWS
COMSPEC=C:\WINDOWS\COMMAND.COM
PATH=C:\WINDOWS;C:\WINDOWS\COMMAND;C:\PP\BIN\GO32V2
windir=C:\WINDOWS
BLASTER=A220 I5 D1 T4
...
Or on Linux
BASH=/bin/bash
BASH_VERSINFO=([0]="2" [1]="03" [2]="0" [3]="1" [4]="release" [5]="i686-pc-linux-gnu")
BASH_VERSION='2.03.0(1)-release'
COLORTERM=1
COLUMNS=80
ENLIGHTENMENT_ROOT=/usr/X11R6/lib/X11/enlightenment
EUID=0
GNOMEDIR=/opt/gnome
GS_FONTPATH=/usr/share/lilypond/afm:/usr/share/lilypond/pfa
HISTCONTROL=ignoredups
HISTFILE=/root/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/tdbengine
HOSTNAME=uli
HOSTTYPE=i686
...
The environtment variables are very favoured by information
transfers from one program to an other (alike the favoured in many
programs clipboard) because programs both read and set a variable
like that.
The http server also creates a set of environtment variables whereat
the informations transfered by client will be considered. Accessorily
the http server inform the program that has to be called about its
attributes using the environment variables. The CGI envionment variables
Notice: You haven't to
internalise the following list because on the one hand tdbengine
competely prepares the most basic variables for you and on the
other hand you'll be able to see them at anytime online if you need
them.
CGI specification says that at least following environment variables has to be created:
Server specific environment variables
GATEWAY_INTERFACE
In this environment variable is the revision of the CGI specification this server supports.
Format: CGI/<revision>
SERVER_NAME The name of computer the server software is running on is in the SERVER_NAME variable.
The alligation occurs as the hostname of the server, as the DNS alias or as the IP address. For example: www.cs.tu-berlin.de
SERVER_SOFTWARE This variable contains the name and the version of the WWW server caused the run of the CGI script.
Format: <name>/<version>
DOCUMENT_ROOT
This variable contains the pathname of the documentation directory of
the WWW server such as it's specificated in the configurations of the
server.
For example: /usr/local/www/doc Request specificated environment variables
The values of the
environment variables in this chapter are request specificated. They
will be made dependent to the server the request was turned to.
AUTH_TYPE
In secured scripts this variable informs about the autentication method to use.
For example: Basic
CONTENT_LENGTH
In METHOD="PUT" or "POST" CONTENT_LENGTH contains the length of the available data in bytes boasted by client.
In METHOD="GET" CONTENT_LENGTH is empty.
CONTENT_TYPE This variable contains
the allegation of the type of the file (MIME type) in the requests
transfering data to server like HTTP claim PUT or POST.
For example (online form): application/x-www-form-urlencoded
HTTP_ACCEPT This variable contains a list of MINE-Content-Types like quoted in the HTTP header that the client can understand.
The several elements are separeted with commas. Format: /, /, ...
HTTP_FROM This variable contains the email of the user who caused the request. Not all browsers support the transfer of the user's email.
HTTP_REFERER This variable contains the URL of the document the client asked for before referencing the CGI script.
HTTP_USER_AGENT This variable provides information abut the client software (Netscape, Mosaic, ...) the CGI script war activated by..
For example: Mozilla/2.0 (Win16; I)
PATH_INFO There are several alternatives to give parameters to the script when starting a CGI script.
One of them is adding this informationes to the URL referencing the script (separeted with commas '/').
Then this informations (inclusive leading '/') will be contained in the environment variable PATH_INFO.
Unfortunately
this method of parameter passing is the most unstable and the impurest.
This variable was originally thought to take the filename that comes
after the virtual CGI script path. The access to a CGI script occurs
using a virtual pathname (for example.: '/CGI/'). If now a script file
is referenced with the URL 'http:/<server>/CGI/datei'
PATH_INFO will contain the value '/datei'. But this value is URL
coded it means that all peculiars were in the URL will be codeed.
For example a + symbol displaces the space that is forbidden in the URL and
so all spaces will be coded in a + symbol so it isn't possible anymore to differ between a '+' as space and the real plus.
So you should avoid parameter passing using PATH_INFO above all there is a more comortable method using the environment variable
QUERY_STRING.
PATH_TRANSLATED As aforementioned
when specificating the PATH_INFO variable mainly was thought of using
it for transfering filenames in it. But this filenames won't be of use
if the place where the filesystem is isn't being transfered coevally,
too. This work should take PATH_TRANSLATED. The server directs the
content through it's mapping system and replaces all virtual path
alliations with physical ones. So with '/file' as value of PATH_INFO
and with mapping of '/*' in '/usr/local/WWW/pub/*' the variable
PATH_TRANSLATED delivers the value '/usr/local/WWW/pub/datei'.
QUERY_STRING The environment
variable QUERY_STRING is set in one of the following three causes:
1.The call of the CGI script occurs out of a document that allows the
entering of a search index using ISINDEX tags. The search index will be
allocated to the environment variable QUERY_STRING. 2.The call of the
script occurs out of a clickable (sensitive) inline-picture. In this
case QUERY_STRING contains the coordinates of the mouse click in the
picture. 3.The script is the addressee of data of an online form that
were sent to the script using the "GET" method. In any of the three
cases the WWW client adds a question mark followed by the paricular
data to the URL referencing a script. In ISINDEX this data would be the
search term, in sensitive pictures it would be mose coordinates and in
an online forms it would be form data.
REMOTE_ADDR This variable contains the IP address of the client computer.
For example: 130.149.18.37
REMOTE_HOST The environment variable REMOTE_HOST contains name of the computer the request came from.
If the server hadn't this informations because the computer accessing has no domain entry this variable would be empty.
If it's so REMOTE_ADDR should be able to help weiter.
For example: quofum.cs.tu-berlin.de
REMOTE_IDENT
If an authentication server's running on the client system with RFC
931the WWW server will be able to find out the identifier of the client
and transfer it to the CGI script into REMOTE_IDENT. You use this
allegations with attention and use them for logging aim at best because
they aren't believable in all cases.
REMOTE_USER In use with identifier
protected documents this variable gives the user name. It hasn't to be
essentially identic with the UNIX user name.
REQUEST_METHOD The method the
request occured with can be found in the environment variable
REQUEST_METHOD. The examples for HTTP as server protocol are "GET",
"HEAD", "PUT", "POST" and so on.
SCRIPT_NAME This environment
variable contains the filename of the script inclusive the virtual path
to it. This variable is mainly of use for script referancing itself
because the scripts can't know that they are gettable on a virtual
pathname.
SERVER_PORT This variable contains the portnumber the request was sent to (generally: Port 80).
SERVER_PROTOCOL The name and the
version of the protocol the request to the server was made with can be
found in the environment variable SERVER_PROTOCOL.
Format: <protocol>/<revision>
Only the following of them are important to a CGI programer:
SCRIPT_NAME which script has to be runned
PATH_EXPANDED the real path to the wanted file
QUERY_STRING the auxiliary informations have to be transfered using the URL
Notice: The tdbengine edits this informations fully automated. The In and Out channals
But this informations
don't suffice. Particularly the CGI program has no possibility to
inform the client about its result (output) that way.
It's possible that the CGI program changes one of the environment
variables (and writes its output into it) whereupon the http server
trasfers it to the client. But it would cause two serious disprofits:
The disk space for the environment is limited so the returns of the CGI
program would be limited, too. And so the http server would have to
take over the transfer of the data and would be limited applicable.
That's why the CGI standatd sets aside that the output channal will
be transfered to the client directly. The limit of the environment
also causes that an additional channal for greater information
transfer from client to server will be created.
Consequently two information channals will be created at the start of a CGI program:
- StdOut for information transfer from CGI program to the client
- StdIn for information transfer from client to CGI program
But while StdOut is used often (a CGI program has to return something) StdIn is used only in specific cases. get and post
Für die Übertragung von Informationen vom Klienten zum Server sieht der CGI-Standard zwei Methoden vor:
get
Here all informations are given over to URL. Thereby the additional
informartions are separated from the original URL with question mark
"?". Anything comming after the question mark the http server transfers
to the environment QUERY_STRING.
Apart Einzelne information components are separeted with the "&"
symbol.
Example:
http://www.tdb-engine.de/cgi-tdb.prg?command=read&page=main.
QUERY_STRING=command=read&page=main.
post
Here all informations are transfered into the StdIn channal of the
server. From this it follows that a normal link <a
href="..."> can use only the get method (because here only the URL
will be transfered). Only in the forms you can choose which method has
to be used: <form method="get"... -> get
<form method="post" -> post
Nitice: In the forms even using a mixed form is possible because on the
one hand the URL for the call of the program will be given
(action="...) so it can contain get addition and on the other hand the
form fiels can be transfered using the "post" method.
You can transfer the form fields and their values to the server (and to the CGI program, too) using following:
<input type="text" name="xyz"> xyz=User's entry
<input type="hidden" name="xyz"> xyz=User's entry ()Eingabe des Benutzers (uncoded!)
<input type="checkbox" name="xyz" value="1"> xyz=1, if selected by user
<input type="radio" name="yxz" value="1"> xyz=1, if this option is selected
<input type="radio" name="xyz" value="2"> xyz=2, if this option is selected
<select name="xyz"> xyz=1, if this option is selected
<option value="1">
<option value="2">...
</select>
<select name="xyz" multiple> xyz=1&xyz=2.. if this options are selected
<option value="1">
<option value="2">...
</select>
<textarea name="xyz">...</textarea> xyz=content of the text
<input type="submit" name="xyz" value="done"> xyz=done if this switch is activated
Example:
In the following document is the following form
<form action="http://www.tdb-engine.de/cgi-tdb/savemail.prg" method="get">
E-Mail: <input type="text" name="email"><br>
Name: <input type="text" name="name"><br>
<input type="submit" name="done" value="send">
</form>
The user fills the both fields with "info@tdb-engine.de" and "Webmaster".
So the browser sends the following URL to the http server:
http://www.tdb-engine.de/cgi-tdb/savemail.prg?email=info@tdb-engine.de &name=Webmaster&done=send
If nothing else is given in the form the data entered by user will be given a special form called with url-encoding.
Many symbols musn't be used to transfer data using a normal URL (For example: space, umlauts, &/?...).
That's why symbols like that will be converted into the symbols
allowed in URL before transfering. Since the browser deals with coding
(at least in forms) and tdbengine(but not the http server) deals with
decoding on the whole server sidedie let us not degrossing this theme.
You can see the coding for example using one of the search engines,
entering a search key with some umlauts and looking at the resulting
URL.
Notice: The extended protocol multipart/form data isn't dicussed here. Context
In this lection the
basic elements were discussed. Now you should know what a CGI program
is, where does it run and how the information transfer between the
client and the server works.
Challenges:
1. How does the http server know that there is a CGI call?
2. Call th following URL in internet: http://www.tdb-engine.de/scripts/set.prg
You'll
get a list of (almost) all environment variables the http server (here
an Internet Information Server) allocates to a CGI program. Using
which of those environment variables you can identify user's browser
software? What would be this entry for youre browser?
3. What is the CGI program on Yahoo replying the query called?
4. You have following form on a HTML side:
<form action="http://www.meinedomain.de/cgi-tdb/log_in.prg" method="get">
Vorname: <input type="text" name="Firstname"><br>
Name: <input type="text" name="Name"<br>
<input type="submit" name="command" value="send">
</form>
The user fills the fields with "Hans" and "Mueller" and hits "send".
What does the URL the client sendsto the server look like?
5. What does the environment variable QUERY_STRING contain in this case (Challenge 4)?
|