|Archivers | |Mark H. Solsman |Documentation Training and Publications, Center for Academic Computing |mhs108@psu.edu 9/14/93 I Introduction Program and data file archives exist in many forms today. Public-access bulletin-board systems (BBS), BITNET-based LISTSERV systems and Internet-based anonymous FTP sites are just three of the more popular places where files are stored for remote access and retrieval by computer users. Archiving and compressing are separate functions that may be performed by individual programs or by a single program, depending on how the program was designed. A third function, encoding, is also sometimes used. Following are definitions of these functions: Archiving means putting a number of files into a single file for convenience. Since many programs are actually comprised of multiple files it is much easier to retrieve programs if their component files have been archived into a single file. Getting the individual files out of an archive is a process known as "extracting" and is performed by the archiving program. Compressing means shifting the bits of a file in a way that makes the file smaller. This is done to reduce storage and transfer expense. Various compression algorithms may be used to achieve compression. A compressed file is typically unreadable, or in the case of a program, unexecutable. The opposite function, decompressing, is used to recover the original files. Encoding means shifting the bits of a file to a form less likely to suffer errors when transmitted through a network. For example, binary files are often encoded to hexadecimal characters and transmitted as text files through BITNET. They must be decoded when received. The encoding guarantees that no transmission errors occur at any point along the transmission path from sender to recipient. The remainder of this document will describe the most popular archiving, compressing and encoding tools utilized on DOS, Macintosh, and Unix platforms. DOS Systems Many archiving programs exist for DOS-based personal computers. These archiving programs typically also incorporate compression. The most popular of these are ARC, ZIP, PAK and ZOO. They are all distributed as SHAREWARE. If you receive an archive file created by one of these programs you will know because the file extension will identify the kind of archiving performed on the file. For example, a file archived with ZIP archiving will have a program extension of .ZIP. In some cases (such as PCLIB archives) an archive will be made "self-extracting". In this case the archive will have an .EXE extension and will include the extraction code. To extract the files from the archive you would simply execute the archive file. Each of these DOS archiving programs is available from the PSUVM file server PCLIB under the CAC-PC library and the ARC directory. Macintosh Systems The most popular archiving programs for the Macintosh also incorporate compression. These programs are Stuffit, Packit and Compact Pro. Archive files created with these programs typically have .sit, .pit, or .cpt appended to the file name. The most popular encoder for the Macintosh is Binhex. Programs encoded with Binhex typically have the string .hqx appended to the file name. SHAREWARE and commercial versions of each of these programs are available from the PSUVM file server PCLIB under the CAC-MAC library and the UTIL directory. Unix Systems The predominant archiver program on unix systems is tar. Tar archives typically have the characters .tar appended to the end of the file name. DOS versions of tar are also available. A tar extractor for DOS systems is available from the PSUVM file server PCLIB under the CAC-PC library and the ARC directory. A PSUVM version of tar is also available. Issue the command HELP TAR for more information on this version. Compress is the predominate compression program for Unix systems. Its corollary, uncompress, exists as a separate program on Unix systems. Files compressed with compress typically have the string .Z appended to the end of their file name. U-COMP and U-DCOMP are PSUVM versions of compress and uncompress located on the PSUVM STAGE disk. To access these programs issue the commands: product stage product share You may then look at the help files for these programs with HELP U-COMP and HELP U-DCOMP. Uuencode and uudecode are the primary encoders used on Unix systems. Files encoded with uuencode typically have the string .uue appended to the end of their file name. The PSUVM program ARCUTIL contains options for uuencode and uudecode. ARCUTIL also supports xxencode, ARC and ZIP formats. Issue the command HELP ARCUTIL for more information. Important information for PSUVM users Many Penn State faculty, staff and students use PSUVM to retrieve programs and data files from BITNET LISTSERV and Internet archive sites. Occasionally programs are also obtained from NETNEWS postings. Programs obtained from BITNET LISTSERV archives or NETNEWS postings are typically encoded. If the program is intended for a Macintosh it will typically be encoded with Binhex. In this case you should strip any mailer headers and trailers from the file, download it as text to your Macintosh, and run Binhex to unencode it. If the program is encoded with uuencode or xxencode your best bet is to use ARCUTIL to decode the file before transferring it to its destination system for extracting or decompressing. Programs and data files obtained from Internet archives are often archived with tar and compressed with compress. If the program is destined for a Unix system your best strategy is to transfer the file to that system (using a binary transfer method) for decompression and extraction. Textual data files obtained from Internet archives pose a special problem for PSUVM users. While U-DCOMP may be used to decompress these files and TAR may be used to extract them they still may be unreadable. If this is the case it probably means that the original file came from a computer using the ASCII character set, and thus it is unreadable on PSUVM (which uses the EBCDIC character set). When this is the case, the following pipe command may be used to convert the file from ASCII to EBCDIC and to align the records correctly: pipe < infile | deblock linend 0a | xlate a2e | > outfile infile is the name of the file to be converted and outfile is the intended name for the converted file. Getting help Most of the programs described in this handout come with help files or have help summaries built in. If you are having trouble getting or using an archiver, compressor or encoder you may contact the CAC Help Desk. CAC Help Desk,12 Willard Building,(814) 863-1035 email: HELPDESK