:: RootR ::  Hosting Order Map Login   Secure Inter-Network Operations  
 
WWW(3) - phpMan

Command: man perldoc info search(apropos)  


WWW(3)                               Library Functions Manual                              WWW(3)



NAME
       WWW - World Wide Web Package

SYNOPSIS
       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION
       This  package  provides a utility functions for the World Wide Web to extract descriptions
       of or meta information from files, and hyperlink text.

SUBROUTINES
       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts a description from an HTML or plain text file given by the FILE name; FILE
              should  be an absolute path.  The first $description::chars (default: 2048) charac‐
              ters are read.  If the file ends in one of the extensions htm, html, or  shtml,  it
              is  presumed  to  be  an HTML file; if the file ends in txt, it is presumed to be a
              plain text file.  Other  extensions  are  not  recognized  and  no  description  is
              returned for them.

              For  HTML  files,  first,  if  a <META NAME="description" CONTENT="..."> or a <META
              NAME="DC.description" CONTENT="..."> (Dublin Core) element is found, then the words
              specified as the value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and
              all other HTML tags are stripped.  If <AREA ... ALT="..."> or <IMG  ...  ALT="...">
              elements are found, then the words specified as the value of the ALT attributes are
              extracted.

              Finally, for either HTML or plain text files, at most $description::words (default:
              50) are returned.

       extract_meta( FILE, NAME )
              Extracts  the  value  of the CONTENT attribute from a META element having the given
              NAME attribute from an HTML file given by the FILE name; FILE should be an absolute
              path.  The file must end in one of the extensions htm, html, or shtml to be consid‐
              ered an HTML file.  The first $description::chars (default:  2048)  characters  are
              read.  The characters are cached between consecutive calls using the same filename.

       hyperlink( LIST )
              Adds  hyperlinks to strings: that is strings that contain substrings that are valid
              URLs (according to RFC 1630) have the appropriate HTML tags ``wrapped'' around them
              so  that  they  will  be  selectable when displayed in a browser.  The ftp, gopher,
              http, https, mailto, news, telnet, and wais URLs are recognized.  Example:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>


SEE ALSO
       perl(1)

       Tim Berners-Lee.  ``Universal Resource Identifiers in WWW,'' Request  for  Comments  1630,
       Network Working Group of the Internet Engineering Task Force, June 1994.

       Tim  Berners-Lee, Larry Masinter, and Mark McCahill.  ``Uniform Resource Locators (URL),''
       Request for Comments 1738, Network Working Group, 1994.

       Dave Raggett, Arnaud Le Hors, and Ian Jacobs.  ``Notes on  helping  search  engines  index
       your  Web  site,''  HTML  4.0  Specification, Appendix B: Performance, Implementation, and
       Design Notes, World Wide Web Consortium, April 1998.

       --.  ``Objects, Images, and Applets: How to specify alternate text,'' HTML 4.0  Specifica‐
       tion, §13.8, World Wide Web Consortium, April 1998.

       Dublin  Core Directorate.  ``The Dublin Core: A Simple Content Description Model for Elec‐
       tronic Resources.''

       Larry Wall, et al.  Programming Perl, 3rd ed., O'Reilly &  Associates,  Inc.,  Sebastopol,
       CA, 2000.

AUTHOR
       Paul J. Lucas <pauljlucas AT mac.com>



WWW                                     February 12, 2000                                  WWW(3)


/man
rootr.net - man pages