Unix Files

Unix File Structure:
File: The file is a container for storing information. As a first approximation, we can treat it simply as a sequence of characters. If you name a file foo and write three characters a, b and c into it, then foo will contain only the string abc and nothing else. Unlike the old DOS files, a UNIX file doesn’t contain the eof (end-of-file) mark. A file’s size is not stored in the file, or even its name. All file attributes are kept in a separate area of the hard disk, not directly accessible to humans, but only to the kernel.
            Unix treats directories and devices as files as well. A directory is simply a folder where you store filenames and other directories. All physical devices like the hard disk, memory, CD ROM, printer and modem are treated as files. The shell is also a file, and so is the kernel. And if you are wondering how UNIX treats the main memory in your system it’s a file too!
So we have already divided files into three categories:
Ordinary file:  Also known as regular file. It contains only data as a stream of characters.
Directory file:  It’s commonly said that a directory contains files and other directories, but strictly speaking, it contains their names and a number associated with each name.
Device file:  All devices and peripherals are represented by files. To read or write a device, you have to perform these operations on the associated file.
Ordinary (Regular) File:
An ordinary file or regular file is the most common file type. All programs you write belong to this type. An ordinary file itself can be divided into two types:
  • Text file
  • Binary file
A text file contains only printable characters, and you can often view the contents and make sense out of them. All C and java program sources, shell and Perl scripts are text files. A text file contains lines of characters where every line is terminated with the newline character, also known as linefeed (LF). When you press [Enter] while inserting text, the LF character is appended to every line. You won’t see this character normally, but there is a command (od) which can make it visible.
A binary file, on the other hand, contains both printable and nonprintable characters that cover the entire ASCII range (0 to 255), Most UNIX commands are binary files, and the object code and executables that you produced by compiling C programs are also binary files. Pictures, sound and video files are binary files as well. Displaying such files with a simple cat command produces unreadable output and may even disturb your terminal’s settings.
Directory File:
A directory contains no data, but keeps some details of the files and subdirectories that it contains. The Unix file system is organized with a number of directories and subdirectories, and you can also create them as and when you need. You often need to do that to group a set of files pertaining to a specific application. This allows two or more files in separate directories to have the same file name.
A directory file contains an entry for every file and subdirectory that it houses. If you have 20 files in a directory, there will be 20 entries in the directory. Each entry has two components:
  • The filename.
  • A unique identification number for the file or directory (called the inode number).
Device File:
Device filenames are generally found inside a single directory structure, /dev. A device file is indeed special; it’s not really a stream of characters. In fact, it doesn’t contain anything at all. The operation of a device is entirely governed by the attribute associated file. The kernel identifies a device from its attributes and uses them to operate the device.
What’s in a (FILE)Name?
On most UNIX systems today, a filename can consist of up to 255 characters, though this figure is normally not reached. Files may or  may not have extensions, and can consist of practically to use ASCII character except the / and the NULL character( ASCII value 0). You are permitted to use control characters or other unprintable characters in a filename. The following are valid filenames in UNIX.
.last_time        list.      ^v^B^D            -{}[]                  @#$*abcd                   a.b.c.d.e.
The third filename contains three control characters (/CTRL-V/ being the first). These characters should definitely be avoided in framing filenames. Moreover, since the UNIX system has a special treatment for characters like $, `, ?, *, & among others, it Is recommended that only the following characters be used in filenames:
  • Alphabetic characters and numerals.
  • The period (.), hyphen (-) and underscore (_).
UNIX imposes no rules for framing filename extensions. A shell script doesn’t need to have the .sh extension even though it helps in identification. In all cases, it’s the application that imposes the restriction.
The File System Hierarchy:
All file in UNIX are “related” to one another. The file system in UNIX is a collection of all of these related files (ordinary, directory and device files) organized in a hierarchical (an inverted tree) structure. This system has also been adopted by DOS and Windows, and is visually represented in the below fig.
The implicit feature of every UNIX file system is that there is a top, which servers as the reference point for all files. This top is called root and is represented by a / (frontslash). Root is actually a directory. It is conceptually different from the user-id root used by the system administrator to log in. we’ll be using both the name “root” and the symbol / to represent the root directory.
The root directory (/) has a number of subdirectories under it. These subdirectories in turn have more subdirectories and other files under them. For instance, bin and usr are two directories directly under /.
            We can specify the login.sql has with root by a pathname: /homeo/romeo/login.sql. The first / represents the root directory and the remaining /s act as delimiters of the pathname components. This pathname is appropriately referred to as an absolute pathname because by using root as the ultimate reference point we can specify a file’s location in an absolute manner.
            The entire file system is divided into two groups. The first group contains the files that are made available during system installation:
/bin & /usr/bin: These directories contains commonly used UNIX commands
/sbin & /usr/sbin: only system administrator can execute these commands in the directories
/dev:  This directory contains device files.
/lib & /usr /lib: this directory contains all library files in binary form.
/etc: contains configuration files of the system.
/usr/local: contains bin and lib
            Users also work with their own files; they write programs, send and receive mail and also create temporary files. These files are available in the second group shown below:
/home: It contains directories of different users.
/tmp: This directory contains temporary files created by users.
/var: It is a variable part of the file System. It contains log files, printing jobs, outgoing mails and incoming mails.

System Calls:
            In the UNIX operating system, applications do not have direct access to the computer hardware (say a hard-drive). Applications have to request hardware access from a third-party that mediates all access to computer resources, the Kernel.
As you can see from the above diagram, the Kernel is a central component of most computer operating systems.
If processes cannot directly access hardware resources, there must be a way to switch from the User Space to the Kernel Space. A user application can explicitly request to transition to Kernel Space by issuing a system call. Each system call provides a basic operation such as opening a file, getting the current time, creating a new process, or reading a character. In this way system calls can be viewed as regular function calls.
            A system call is just what its name implies — a request for the operating system to do something on behalf of the user’s program.
            
When a program invokes a system call, it is interrupted and the system switches to Kernel space. The Kernel then saves the process execution context (so that it can resume the program later) and determines what is being requested. The Kernel carefully checks that the request is valid and that the process invoking the system call has enough privilege. If everything is good, the Kernel processes the request in Kernel Mode and can access the device drivers in charge of controlling the hardware (e.g. reading a character inputted from the keyboard). The Kernel can read and modify the data of the calling process as it has access to memory in User Space. When the Kernel is done processing the request, it restores the process execution context that was saved when the system call was invoked, and control returns to the calling program which continues executing.
Unix has a variety of system calls for carrying out file operations. The standard I/O library in Unix which is also called stdio library, which has a high-level interface for the file operations, such as printf, putchar etc, is built from this lower-level interface. Typical low level system calls are…
·         int create(const char *path, mode_t mode);
·         int unlink(const char *path);
·         int rename(const char *old, const  char*new);
·         int open(const char *path, int flags, mode_t mode);
·         int close(int fd);
·         int read(int fd, void *buf, unsigned nbyte);
·         int write(int fd, void *buf, unsigned nbyte);
·         off_t lseek(int fd, off_t offset, int whence);
·         int link(char *path1,char *path2);
·         int chmod(char *path, mode_t mode);
·         int stat(char *path,struct stat *buf);
·         int fstat(int fd, struct stat *buf);
·         int lstat(const char *path, struct stat * buf);
Library Functions:
Difference between system calls and library functions is that system calls usually provide a low level interface, whereas library functions often provide high level interface. Low level interface means more details we have to give to perform required service. High level interface provides more functionality within one statement. We can write code in short amount of time and give less details to perform required service.
An application can call either a system call or a library function. Also realize that many library functions invoke a system call. 
Open: The  open system call opens the file in the mode given by flag, and returns a file descriptor. This is an integer which the O/S uses to index into a per process file descriptor table, which is used to keep into about open files. Zero is always stdin, 1 is stdout, 2 is stderr. The flag is a bit-wise OR of a number of constants they are
·         O_RDONLY ( reading )
·         O_WRONLY ( writing )
·         O_RDWR (read & write)
·         O_APPEND
·         O_CREAT
·         O_TRUNC
O_CREAT is used to create new file if the given file doesn’t exist.  When O_CREAT is one of these flags, an additional argument with this is the file permission mode.
O_APPEND is used to append data to the existing file.
O_TRUNC is used to eliminate the characters in a file. After TRUNC file contains 0 bytes.
Syntax:
#include <sys/stat.h>
#include <fcntl.h>
int open(const char *path, int flags, mode_t mode);
Returns file descriptor or -1 on error.
The different types of permissions under mode_t mode are
·         S_IRUSR            00400
·         S_IWUSR          00200
·         S_IXUSR           00100
·         S_IRGRP           00040
·         S_IWGRP          00020
·         S_IXGRP            00010
·         S_IROTH          00004
·         S_IWOTH         00002
·         S_IXOTH          00001
Eg:     
#include <sys/stat.h>
#include <fcntl.h>
void main()
{           int fd;
 fd=open(“/home/user/sample”,O_RDWR | O_CREAT, 00700);
 printf(“File : %d “, fd); 
            }
Create:   This system call, create takes the pathname of a file and creates a new file, removing the contents if it already existed. The mode sets the access permissions.  
Syntax:
#include <sys/stat.h>
#include <fcntl.h>
int create(char *path, mode_t mode);
            Returns file descriptor or -1 on error
Eg.       int fd;
fd = create(“/home/user/sample”,00700);
The  default flags when creating new files are
   O_WRONLY
   O_CREAT
   O_TRUNC
Read: The read system call is a block read of a given number of bytes. This could be tuned to the physical characteristics of the device read from.
Syntax:            #iinclude<unistd.h>
ssize_t read(    int fd,                           // file descriptor                                                                                 void *buf,                    // address to receive data                                                                  size_t nbytes);             // amount to read
                  Returns number of bytes read or -1 on error.
Write: The write system call is also a block write of a number of bytes.
Syntax:            #include<unistd.h>
ssize_t write(   int fd,                           // file descriptor                                                                           const void *buf,           // data to write                                                                                         size_t nbytes);             // amount to write
Returns number of bytes written or -1 on error.
Close : This system call used to close the file descriptor.
Syntax :
            #include <unistd.h>
int close(int fd); //file descriptor
Returns 0 if success or -1 for error.
Eg:       #include<unistd.h>
            #include<fcntl.h>
#include<sys/stat.h>
int main()
{
int fd,i;
char x[20];
close(1);
fd=open(x,O_WRONLY|O_CREAT);
printf(“File: %dn”, fd);
                        system(“ls”);
                        close(fd);
return 0;
             }
            When we try to open a file, first empty (row) entry in the open file descriptor table is taken by searching this table always from first row, and is given to this new file to store this file’s information. This row number is called as file descriptor of the file. For any program, by default three descriptors stdin(0), stdout(1) and stderr(2) are available.
            Thus, in the above program after closing stdin i.e 1, when we call open() system call the file descriptor for the new file becomes 1 (under plain situation on most unix systems). Now, when we execute “ls” command, its output will be written into this file as by default “ls” commands output goes to standard output whose file descriptor is 1. Whereas now file descriptor 1 belongs to the file, thus “ls” command output goes to the file.
lseek:
            The UNIX system file treats an ordinary file as a sequence of bytes. No internal structure is imposed on a file by the operating system. Generally, a file is read or written sequentially – that is, from beginning to the end of the file. Sometimes sequential reading and writing is not appropriate. It may be inefficient, for instance, to read an entire file just to move to the end of the file to add characters. Fortunately, the UNIX system lets you read and write anywhere in the file. Known as “Random access”, this capability is made possible with the lseek() system call. During file I/O, the UNIX system uses a long integer, also called a File Pointer, to keep track of the next byte to read or write.  This long integer represents the number of bytes from the beginning of the file to that next character. Random access I/O is achieved by changing the value of this file pointer using the lseek() system call.
Syntax :           #include <unistd.h>
                        off_t lseek(      int fd,                           // file descriptor                                                                                             off_t pos,                     // position                                                                                                       int whence);                // interpretation
                        Returns new file offset or -1 on error.          
The different values for whence are
  1. SEEK_CUR :  The file offset is set to current value plus position argument
  2. SEEK_SET: The file offset is set to the position argument
  3. SEEK_END: The file offset is set to size of the file plus position argument
stat:
            The system call stat() is used to get file status information. That is metadata information or statistics of the file can be accessed through this function call (also using fstat(), lstat()). All of these functions find out information about files (permissions, owner, file type etc). The only difference between them is the way in which they treat symbolic links. If ‘stat’ is used on a symbolic link, it stats the file the link points to rather than the link itself. If ‘lstat’ is used, the data refer to the link. Thus, to detect a link, we must use ‘ lstat’.
            All these functions use a data structure known as ‘stat which is defined in the file ‘/usr/include/sys/stat.h/’. Here are the data members of the structure.
struct stat
{
            dev_t   st_dev;            /* device ID of file system*/
            ino_t    st_ino;             /* file inode */
            mode_t st_mode;        /*permission*/
            short    st_nlink;           /* number of hardlinks to file*/
            uid_t    st_uid;             /* user id*/
            gid_t    st_gid;             /* group id */
            dev_t st_rdev;             /* device ID if special file*/
            off_t    st_size; /* size in bytes */
            time_t st_atime;         /* last accessed time */
            time_t st_mtime;        /* last data modification time*/
            time_t st_ctime;         /* last i-node modification */
            blksize_t st_blksize;     /* block size */
            blkcnt_t st_blocks;      /* number of blocks */
};
Syntax:    #include<sys/stat.h>
                 int stat(const char *path,               // path name                                                                                       struct stat *buf);                     // returned information
Returns 0 on success or -1 on error.
fstat:  It is used to get file information by using file descriptor.
Syntax: #include<sys/stat.h>
              int fstat(int fd,                                   // file descriptor                                                                                 struct stat *buf);                     // returned information
Returns 0 on success or -1 on error.
ioctl : The ioctl controls character devices or character special files, which relate to devices through which the system transmit data 1 character at a time. For example, keyboard, virtual terminal, serial modem etc,…
Syntax:            #include <sys/ioctl.h>
int ioctl ( int fd,                       // file descriptor                                                                                 int req,                       // request                                                                                            —);                                    // arguments that depend on request
Returns -1 on error. Some other value on success.
Umask: This system call used to change the default file permission of a file.
Syntax:#include<sys/stat.h>
              mode_t umask(mode_t cmask);       // new mask
             Returns previous mask (no error return).
This function returns previous mask (permission) of the file.
cmask includes
·         S_IRUSR           – user Read
·         S_IWUSR         – user write
·         S_IXUSR           – user execute
·         S_IRGRP          – group read
·         S_IWGRP         – group write
·         S_IXGRP           – group execute
·         S_IROTH          – other Read
·         S_IWOTH         – other write
·         S_IXOTH          – other execute
dup:
The dup() system call duplicates an open file descriptor. The new file descriptor has the following properties in common with the original file descriptor:
  • refers to the same open file or pipe.
  • has the same file pointer – that is, both file descriptors share one file pointer.
  • has the same access mode, whether read, write, or read and write.
Syntax: #include<unistd.h>
             int dup(int oldfd);                   // old file descriptor
            Returns new file descriptor or -1 on error.
dup2:
            This System call also used to duplicates an open file descriptor and returns the new file descriptor.
Syntax: #include<unistd.h>
              int dup2(int oldfd,                 // old file descriptor                                                                                       int newfd);                   // new file descriptor to use
            Returns new file descriptor or -1 on error.

Leave a Reply

Your email address will not be published. Required fields are marked *

Enable Notifications OK No thanks