Re: [phplib] default authentication, anyone using it? From: Kristian Koehntopp (kris <email protected>)
Date: 08/06/01

In netuse.lists.phplib you write:
>> But, does an application want to store its internal data in XML
>> representation?

>When you store something you put it out, kind of export it.
>If it is stored also for others to use, why not?

Both, XML BLOB and SQL tables store data in a format that is
suitable for others to use. XML BLOBs however do not lend
themselves to vertical queries ("find the average age of all
registered users from the userdata we have") as well as SQL
tables do. SQL tables perform better at the cost of additional
table design (need one column for each value that is to become
part of userdata). I can see two ways around this additional
cost:

1. Wrapping

Wrap this additional table design overhead into code, and
creating additional columns for additional data items is as
simple as $user->register("age") - but at much better
performance than XML BLOBs could ever hope to have.

2. Name-Value tables

Alternatively, require that all userdata items be of a specific
type (for example, they must fit into a varchar(255) column.
Define userdata as

create table userdata (
  p_ud_id integer not null auto_increment primary key,
  p_user_id varchar(32) not null references auth_user (p_user_id),
  name varchar(80) not null,
  value varchar(256) not null,
  unique (p_user_id, name),
  index (name)
);

You can then have userdata as name-value pairs as in

(1, "0123456789ABCDEF0123456789ABCDEF", "background-color", "#ff0000" )
(2, "0123456789ABCDEF0123456789ABCDEF", "age", "33" )
(3, "DEADBEEFDEADBEEFDEADBEEFDEADBEEF", "age", "19" )

and so on.

This can be extended without changes to the schema definition,
but is slightly slower to query that a single-row-per-user
schema.

>> go. The application we are talking about should be able to dump
>> the respective data into some XML format, and be able to
>> reimport that XML dump into its internal format, even if it has
>> been processed by some other application inbetween.

>Processed is a vague definition. Read poses no problem, and this
>normally cover a good part of the cases.
>Modified by a single user yes, if you write the application on both
>sides. And with this you cover most cases, almost all if we are speaking
>about user data.

If you write the applications on both sides, you do not need XML
at all, because you control the formats. The most efficient idea
here would be to use some format that has as little parsing
overhead as possible. Also, you do not need self-describing
formats in most cases.

The whole point with XML is that readers and writers to not know
of each others requirements, or - in extreme cases - do not even
no anything in particular about the structure and/or semantics
of the data they are processing (beside the fact that it is
wellformed XML).

>> Advantage:
>>
>> - documented format that is easily parseable by any application
>> (as opposed to some PHP program that represents the data and
>> which can only be parsed by a PHP parser).

>You can invent new data on the fly. This is the biggest advantage.

Right.

- Emulated only with extended cost at the SQL side, if you
  insist on "single row per user" schemas.

- As simple as in XML, if you can live with name-value pair
  tables as shown above. Name-value pair tables do allow for
  limited amounts of vertical queries, though.

>There is is a practical, reasonable mark for 'consolidated'
>structures. Further one must provide the possibility to stitch
>free form fringes. They can always become consolidated later,
>if you see need.

>I don't deny the necessity of consolidated structures. But I
>think that not to provide for provisional creation of
>'unconsolidated', free form, fringes of information is an
>error.

This is in fact a question that touches aspects of design
philosophy. I hope you don't mind if I explain my view of this.

The way I see things is as follows: Session data is a kind of
bag that contains anything the particular application knows
about the current sessions user. This we call the application
state.

Application state is being used to accumulate data in free form
from consolidated tables, like items are being collected in a
shopping cart object that is part of the session. At some time
the application has accumulated enough data or reached a certain
internal state.

At this point the application commits the free form data from
its "session bag" back into (different) consolidated tables. In
our hypothetical shopping application that point would be the
checkout and order pages, where the user decides to actually
review and place the order.

This particular example is in fact a very good one, because in a
properly designed application the session "bag" is actually
carrying data over from a normalized "live data" representation
of the shop into a denormalized "star tables" structure of an
order log and data warehouse. You wouldn't want to keep
timestamped price data series in your live data, you only keep
one single "current" price for an article or set of articles (an
"offer"). If the price changes, you overwrite the current price
and keep no record of old prices in your live data. (See "Oracle
Design", O'Reilly and Associates, Chapter 7: "Temporal Design",
to understand why every other solution sucks big time and should
be avoided at all cost).

Because that particular price is variable over time, you cannot
just enter an article number into an order log, but you need to
copy all critical data that may vary from the live tables into
the session, and from the session into the order log. As a nice
side effect, this lends itself to data warehouse design and star
tables automatically, so that you can generate aggregates and
statistics ex post with little additional design ("Oracle Data
Warehousing", Coriolis Group Books, gives and introduction to
that).

        For example, you would have user ids and (denormalized)
        user addresses in your order log, as user addresses may
        change over time, and you would want to have old user
        addresses on old orders, and new user addresses on new
        orders of the same uid. Similar rules would exist for
        prices of items in the order log.

        You would then generate a list of aggregated ZIP code
        zones from the order log, and see how much items, and
        how much turnover each region you are looking at
        generates. You would do similar things along different
        dimensions, such as articles, user ids, times of day,
        day of months, months of year, price-range by user id
        and so on. You get a number of pre-calculated aggregates
        that group themselves around your order log "fact
        tables" - the typical star table schema in a data
        warehouse.

        I believe that is what you were trying to explain with
        your rebate scheme and customer behaviour analysis
        example?

The session is the general, form free and requirement free
vehicle that is being used to carry this data over, and
checkout-driven user commits are triggering that data copy
operation. Since we cannot predict the actual structure of the
data being copied, and since the data is being restructured and
denormalized during the copy and commit operation anyway, the
Session is designed correctly being form free.

This is not entirely true for user preferences data. This kind
of data is formatted, often it is name-value pairs or
name-(array of values) pairs. Also, unlike sessions which hold
only short lived data that is finally committed back into a
structured tables, user preferences data has a long lifetime and
there is a need to perform vertical queries over user
preferences.

There is little need to perform vertical queries over session
data besides the garbage collection - and this is precisely why
the changed column is an extra column in active_sessions. We
could easily have had $sess->register("changed") as we do with
$sess->register("auth") and then setting $auth->auth["exp"]. We
are not doing that, because a garbage collection is a vertical
query on active_sessions, and doing it with "changed" being part
of active_sessions.val would be prohibitively expensive.

        In fact there is a certain interest for some people to
        have $auth->auth[] available in a more structured form
        as part of the active_sessions table, and I know of at
        least one person that modified PHPLIB to hold the
        information originally stored in $auth->auth[] in
        additional columns of active_sessions.*.

        This allows for vertical queries on user authentication
        informations, for example "how many users are logged in?"
        or "is this user logged in more than once?". These things
        are prohibitively expensive with the default auth.inc
        implementation, but the default auth.inc allows you to
        drop in alternative authentication methods, even such
        methods that require additional data fields in
        $auth->auth[], such as for example $auth->auth["passwd"]
        or $auth->auth["dn"] or similar.

So this is the ideas and thoughts behind my proposal to turn the
User class into something that uses a data storage different
from the Session class, and why I think that XML would be of
limited use as a data storage container here.

I still believe that there should be something in PHP or PHPLIB
that can take all Session data, or all User data, and emit it as
XML as an answer to some XMLRPC request or similar (I believe
the WDDX extension to PHP already does that, but I have not
checked). I just do not belive that it would be useful to
actually store Session or User data in XML format.

Kristian

-- 
	http://www.amazon.de/exec/obidos/wishlist/18E5SVQ5HJZXG
		"bow down before the one you serve.
		 you're going to get what you deserve."
			-- Trent Reznor (Sysadmin?)

-- Abbestellen mit Mail an: phplib-unsubscribe <email protected> Kommandoliste mit Mail an: phplib-help <email protected>