Archive

Archive for September, 2005

Structured vs unstructured information

September 13, 2005 Leave a comment
This article was first written in September 2005 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/293).

What is “structured information” ?

It is information that is already structured in fields, such as “date”, “title”, “subject”, “unit price”, “quantity”, “total price”, “commission percentage”. Typically, what you find in a record of a relational database table.

When information is structured, it is usually relatively easy to search it, since you can easily tell a program : give me the list of record numbers in the table CUSTOMERS, where total sales is greater than 1,000 and name starts with the letter A.

The drawback is that such RDBMS systems usually require that the “fields” have a certain maximum size : a date can have maximum 8 digits yyyymmdd, a name can have maximum 30 characters, etc… This is because the information must fit into “columns” and “tables”, and it is difficult for such systems to handle efficiently data that can vary significantly between one row and the next.

What is “unstructured information” ?

Generally speaking, what people mean by that is text, such as can be found printed on a 2-page memo. Although there may be some visual structure for a human reader (it’s easy to find the date, wether it’s on the left or right side; it’s easy to find the subject once you read a couple of paragraphs), for a program it is something else altogether. Contrary to popular belief, the amount of unstructured information in this day and age is several orders of magnitude larger than the amount of structured information. Unstructured information does not fit easily in the “columns and rows” concept of relational databases : the text of a memo may contain 1 paragraph or 100 (especially mine), a book may have chapters of varying lengths, a technical description of an Airbus plane requires a few hundred boxes of drawings and text pages, etc.. Thus relational database engines have trouble handling that kind of data, and must generally handle it as “blobs” [1] stored in a different way than the usual columns and rows, which also makes their handling more difficult for this kind of programs.

What is a mixture between structured and unstructured information ?

Most information is like that. There may be some structured bits like date, author, etc… but then you have one or more paragraphs with “description”. For all practical purposes, a mixture is to be considered as unstructured. Information systems that can handle unstructured information can usually handle structured information as well (not always). The opposite is generally not true.

[1] “Binary Large OBject”: collection of bits of undetermined length and of which we don’t really know what’s in it. Or at least the DBMS doesn’t know.

Advertisements
Categories: English, Misc

HOWTO Change the MTU of an interface on Debian

September 1, 2005 Leave a comment
This article was first written in September 2005 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/290).

To change the MTU of an interface on GNU/Linux, you just need to tell ifconfig to do so, like this for example:

/sbin/ifconfig eth0 mtu 1492

To change it permanently on Debian, put it in the /etc/network/interfaces, where almost all network parameters are found. To do this, just add a line mtu to the definition of your interface. Example:

iface eth0 inet static
    address 192.168.0.1
    network 192.168.0.0
    gateway 192.168.0.254
    netmask 255.255.255.0
    mtu 1492

There is an exception, though.

Warning: the following is mostly obsolete in Sid and Etch

It seems that the dhcp clients are not configured by default to do the same for dynamically-assigned configurations [1]. So, you need to use a tweak to achieve the same. We’re going to use the pre-up feature of /etc/network/interfaces like this:

iface eth0 inet dhcp
    hostname "mymachine"
    name LAN Interface
    pre-up /sbin/ifconfig $IFACE mtu 1492

[1] see DBTS #294044, and DBTS #333519, and DBTS #309205 for more information

Categories: English, Tech Crunch Tags: ,
%d bloggers like this: