Using sockets in PHP : Get articles from Usenet
PHP can open sockets on remote or local hosts. Here is a hands-on example of using
such a socket: getting connected to a Usenet News Server, talking to this server,
and downloading some articles for a precise newsgroup.
Opening a socket in PHP
Sockets are opened using fsockopen(). This function is both available in PHP3 and
PHP4. It uses the following prototype :
<?php
int fsockopen
(string hostname,
int port [,
int errno [,
string errstr [,
double timeout]]])
?>
For the Internet domain, it will open a TCP socket connection to hostname on port port.
hostname may in this case be either a fully qualified domain name or an IP address. For UDP
connections, you need to explicitly specify the protocol: udp://hostname. For
the Unix domain, hostname will be used as the path to the socket, port must be set to 0 in
this case. The optional timeout can be used to set a timeout in seconds for the connect system call.
Network News Transfer Protocol
Accessing a Usenet News Server requires using a specific protocol, called NNTP and standing for Network News Transfer Protocol.
This protocol is higly detailed in RFC977 (Request For Comment number 977), which is available at :
http://www.w3.org/Protocols/rfc977/rfc977.html
This document described precisely how to connect to and then dialog with the NNTP server thanks to the various commands available for the task.
Connecting
Connecting to the NNTP server requires knowing its hostname (or IP address) and the port it is listening on. You should include a timeout so that an unsuccessful attempt at connecting does not "freeze" the application.
<?php
$cfgServer = "your.news.host";
$cfgPort = 119;
$cfgTimeOut = 10;
// open a socket
if(!$cfgTimeOut)
// without timeout
$usenet_handle = fsockopen($cfgServer, $cfgPort);
else
// with timeout
$usenet_handle = fsockopen($cfgServer, $cfgPort, &$errno, &$errstr, $cfgTimeOut);
if(!$usenet_handle) {
echo "Connexion failed\n";
exit();
}
else {
echo "Connected\n";
$tmp = fgets($usenet_handle, 1024);
}
?>
Using sockets in PHP : Get articles from Usenet
Talking to the Server
We are now connected to the server, and can talk to it through th previously opened socket. Let us say we want to get the 10 latest articles from some newsgroup. RFC977 specifies that the first step is to select the right newsgroup with the GROUP command :
GROUP ggg
The required parameter ggg is the name of the newsgroup to be selected (e.g. "net.news"). A list of valid newsgroups may be obtained from the LIST command.
The successful selection response will return the article numbers of the first and last articles in the group, and an estimate of the number of articles on file in the group.
Example:
chrome:~$ telnet my.news.host 119
Trying aa.bb.cc.dd...
Connected to my.news.host.
Escape character is '^]'.
200 my.news.host InterNetNews NNRP server INN 2.2.2 13-Dec-1999 ready (posting ok).
GROUP alt.test
211 232 222996 223235 alt.test
quit
205 .
After receiving the command " GROUP alt.test", the News Server answered "211 232 222996 223235 alt.test". 211 is an RFC defined code (basically saying the command was succesfully executed - check the RFC for more details). It also answered it currently has 232 articles, indexed 222996 for the oldest through 223235 for the latest. These are called article numbers. Now, let us have a count here : 222996 + 232 by no means equals to 232235. The seven missing articles were removed one way or another from the server, either cancelled by their legitimate author (yes, it is possible and easy to do !) or deleted after report of abuse for example.
Be careful though, the server might require authentication before selecting the newsgroup, depending on wether it is a public or private server. It could also let anybody retrieve articles but require authentication to publish an article.
<?php
//$cfgUser = "xxxxxx";
//$cfgPasswd = "yyyyyy";
$cfgNewsGroup = "alt.php";
// identification required on private server
if($cfgUser) {
fputs($usenet_handle, "AUTHINFO USER ".$cfgUser."\n");
$tmp = fgets($usenet_handle, 1024);
fputs($usenet_handle, "AUTHINFO PASS ".$cfgPasswd."\n");
$tmp = fgets($usenet_handle, 1024);
// check error
if($tmp != "281 Ok\r\n") {
echo "502 Authentication error\n";
exit();
}
}
// select newsgroup
fputs($usenet_handle, "GROUP ".$cfgNewsGroup."\n");
$tmp = fgets($usenet_handle, 1024);
if($tmp == "480 Authentication required for command\r\n") {
echo "$tmp\n";
exit();
}
$info = split(" ", $tmp);
$first = $info[2];
$last = $info[3];
print "First : $first\n";
print "Last : $last\n";
?>
Using sockets in PHP : Get articles from Usenet
Getting some articles
Now that we have the article number of the latest article, it is easy to get the latest ten articles. RFC977 says the ARTICLE command can be both used with the article number or the its Message ID.
Be careful here. The article number is different from its Message ID, as every news server will assign its own, so the article number of the same article will not be the same on two different news servers, whereas the message ID, included in the articles's header, is unique.
<?php
$cfgLimit = 10;
// upload last articles
$boucle=$last-$cfgLimit;
while ($boucle <= $last) {
set_time_limit(0);
fputs($usenet_handle, "ARTICLE $boucle\n");
$article="";
$tmp = fgets($usenet_handle, 4096);
if(substr($tmp,0,3) != "220") {
echo "+----------------------+\n";
echo "Error on article $boucle\n";
echo "+----------------------+\n";
}
else {
while($tmp!=".\r\n") {
$tmp = fgets($usenet_handle, 4096);
$article = $article.$tmp;
}
echo "+----------------------+\n";
echo "Article $boucle\n";
echo "+----------------------+\n";
echo "$article\n";
}
$boucle++;
}
?>
We just retrieved the ten latest articles available for this newsgroup on this server. It is also posible to get only the article's header, thanks to the HEAD command, or only the text using the BODY command.
Using sockets in PHP : Get articles from Usenet
Closing the connection
To end the session with the NNTP server, just close the socket using fclose() as you would close a file.
<?php
// close connexion
fclose($usenet_handle);
?>
Conclusion
We just saw how to open, use then close a socket in a precise context : connecting to an NNTP server and getting back some newsgroup articles. Posting some articles on an NNTP server using the POST command is not much more complicated.
The next step is therefore coding an HTML news client (and get rid of Netscape:p).
It is also very easy to store the articles, index them using some search engine such as ht://dig (
http://www.htdig.org/) and then you have a web based application for keyword searching some newgroups.
An example of such an application is available at
http://www.phpindex.com/ng/
-- Armel