|The semaphore concept of the tdbengine|
Many tdbengines at once
In theory any number of tdbengines can work simultaneously. Each instance takes about 580 KByte of memory (plus the memory the variables take), so with only 64 MByte of free memory more than 100 engines can operate simultaneously. This should be enough for even the bestvisited sites.
Assuming that the average duration including loading/initiation/cleanup (under load, while several other programs are running) of a single instance is about 0.1 second, about 3,600,000 orders per hour can be handled, which is a firgure most website operators can only dream of.
Of course, there are programs running longer than the proposed 0.1 seconds. You should think about optimizing your program as soon as it comes to a running time of more than a second. Even with an average running time of about a second about 360.000 requests per hour were possible, if the processes were really independent from each other.
Synchronisation is neccessary...
I am sorry to say: They are not. Because as soon as programs begin to write data, they must not disturb each other. Without taking a too detailed look at the matter, just an example: Write - access to an index - file of a table sometimes is cause to a complete reconstruction of the BTree (tdbengine saves indexes and sometimes even tables in such structures). If another process tries to access this index during recontruction, it won't be able to read anything useful. Transactions (that are the transitions between stable situations) need to be secured. In process technology terms, these transitions are called "critical states", which are to be carried out in Isolation by a process and undisturbed by other processes.
Thus, processes that work on a common data stock need to be synchronized. Using tdbengine, this synchronization is done by so - called semaphores. Just think of a semaphore as a guardian at the beginning of a critical path who permits only as many processes to enter path as the path can bear. If only one process may enter the path at the same time (which is the most common cause), we call it a binary semaphore (go-wait). Currently the tdbengine supports only binary semaphores.
In the beginning, the tdbengine always "plays safe": There is only a single semaphore responsible for all programs, and it lets pass only one single process at the same time. All others must wait. Now, the calculation from above looks quite different: Again, we assume 0.1 seconds average running time per process. Now we can only handle 36.000 requests per hour. The possibility of parallel computing doesn't enhance the performance, but only provides us with a cache for a maximum of 100 requests, so that peaks of about 1000 requests per second can be stored. These pile up on the waiting - queue.
...and eats up performance
Assuming, we have a program that takes a long time to run - let's say 30 seconds - because a large amount of records needs to be exported. During this time, all other processes are send to the waiting queue, even if they have nothing to do with this range of records. The result are tons of of cgi - overruns and disapointed visitiors, who decide heading for other pages. Thats not the way it should be!
And that is not the way it is. The - in the beginning - radical security strategy of tdbengine is based on the fact that tdbengine doesn't know what the single programs do, which datafiles they access and whether it comes to "critical states" at all. Tdbengine doesn't know, but you do. So you can create new semaphores and make the data - flow more performant on the ohne hand and more secure on the other. And thats's what it is all about.
Now how to do it the right way
For this intervention we have two possibilities: Either some entries to the configuration - file "tdbengine.ini" or special semaphore functions in the program itself. First, we'll look at the possibilities of the configuration - file.
Semaphores in the configuration - file
Since version 6.2.7 of tdbengine we have the possibility to use local configuration - files instead of global ones. Now, let's make use of this feature and place a "tdbengine.ini" - file into the folder, in which the compiled EASY - moduls are located. This file should contain the following entries:
All that's left to do is to create the folders "log" and "sema" below the prg - folder and provide the user who calls the tdbengine (in most cases thats the anonymous http-client) with the rights to create and change files there.
Next you should create a nice little HTML - file "too_much_to_do.html", which explains in utmost friendly words, that currently thousands of people are raiding this page right now and thats why the server is under stress. This way, the ugly "cgi-overrun" - message is overwritten. This website should be available via the URL you specified as well as the site "global_update.html", with which you can indicate that the database is being maintained at the moment and the dynamic contents will be available again soon.
Let's have a closer look at the entries in the config - file:
By specifying logcgi=1 you can make tdbengine to log all activities to a logfile. That way you gain an exact overview of what is going on. Now we change the path to the logfile by setting log=./log/cgi.log.
Semadir specifies the folder in which the semaphores are stored. These are created as empty files by tdbengine and locked or unlocked via system calls. The advantage of this is, that if tdbengine doesn't cleanly terminate - though this should never happen, but cannot be 100% excluded - the filelocks are freed again.
Under timeout you can specify the time (in milliseconds) a process waits, before the overrun - message is displayed. We set it to 10 seconds, which is a good figure for a web application. At least after 10 seconds an impatiently waiting user wants to know what is going on.
Now we need to create a separate section in the config - file for all programs:
If one program accesses the data stock in readonly mode, and no other program writes to the same data stock, we can replace the xxx with "nosema" for this program. In this case the guardian is dismissed and any number of instances of tdbengine are allowed to execute this program simultaneously.
All other program which want to access a common data stock in read or write (or both) mode are assigned to a single semaphore.
For example: Assuming we want to program a guestbook with 3 single programs:
All these programs access the data stock of the guestbook: tables, config - files, etc., either read - mode (all three) or write - mode (the last two). To avoid them getting in each others way, we create a single semaphore for all of them:
# Guestbook - section
Please note: Lines beginning with the # - Character are interpreted as comments.
Thats how we make sure, that these programs are not executed parallelly to avoid data - chaos.
Semaphores in the program
As an addition to the already shown methods we also have two EASY - functions, with which semaphores can be treated much finer.
To use these functions properly you need one certain pieve of information: Each program can only deal with one semaphore at once.
Often the real database accesses need very little time, while the esthetic refinement of the data takes quite long. In very few fractals of a second a record is found and read. In fact, the lock could be lifted right now, even if the record was still in use by Getfield or Subst.
In most cases, after the last read or write, we could tell the guardian: "Alright, I am through. You can send in the next one." Thats what the function EndSema() is about.
Please note: tdbengine does something very similar automatically, when it buffers CGI-output, then unlocks the semaphore, then afterwards sends the data to the webserver and thereby to the client. This transfer is not a critical one for sure.
The use of the function WaitSema() is a bit more delicate, since it is imperative to use it before the table is opened. While opening a table many relevant data are read (in contrast to closing a table, where no more data is written.)
IF WaitSema('mysema',1000) THEN
ErrorMessage('Table is locked')
IF WaitSema('mysema',5000) THEN
Lengthy updating processes
ErrorMessage('Table is locked')
Finally lets look at a practical example. Right on the internet there are many situations, where a continuous update of the database is neither possible nor neccessary. Just think of search - engines. With those many Requests a sequentialisation of processesby the use of semaphores would be absurd. So these processes have no write access at all, but collect all newly entered URLs in a separate table. Sometime at night, when the average load is low, the present data are copied and merged with the data collected this day. These - and possibly older URLs - are examined, the related sites are read and all the relevant data go into searchstructures. Finally, the original database is replaced by the updated database.
Altogether it is only a single short lock neccessary, while replacing one database with the updated copy.
Please note: The program, which collects new URLs in an auxiliary table needs to set clean locks of course, but they do neither affect the search itself nor do they play a major role regarding execution time.
You should use no Semaphore (sema=nosema) at all, if you seldomly need a long reorganisation phase and you should deactivate all reading programs during this phase. Tdbengine got all you need:
// Updating of the database.
// The following programs Access the database Read-only:
// 1. Stop CGI-Execution for these programs.
VAR Konfig : STRING = 'tdbengine.ini'
// 2. wait 10 seconds, UNTIL all programs have terminated.
// 3. Update the database
// 4. Allow CGI-execution for all other programs again.