Module 12 - Working With Files Header

Module 12 - Working With Files

Read and Write to Files

In Python, there are several important functions and practices to be familiar with when dealing with files. These functions are part of the built-in open() function and various methods of file objects. Below, I'll explain each step thoroughly to help beginning programming students understand the concepts and best practices.

Opening a File

To work with a file in Python, you need to open it first. The open() function is used for this purpose. It takes two mandatory arguments: the file name (or file path) and the mode in which the file will be opened. The mode indicates how the file will be used, such as reading, writing, or appending data.

Here are the common modes:

  • 'r': Read mode (default). This allows you to read data from the file.
  • 'w': Write mode. This will create a new file or overwrite the existing content if the file already exists.
  • 'a': Append mode. This will add data to the end of the file without overwriting existing content.
  • 'b': Binary mode. This is used for working with binary files, like images or executables.
  • 't': Text mode (default). This is used for working with text files and is the same as not specifying anything.

Example: Opening a file in read mode and reading its content.

file_path = "example.txt"
with open(file_path, 'r') as file:
content = file.read()
print(content)

Reading from a File

After opening a file in read mode, you can read its content using several methods provided by the file object. The most commonly used methods are:

  • read(): Reads the entire content of the file as a single string.
  • readline(): Reads a single line from the file.
  • readlines(): Reads all lines from the file and returns them as a list of strings.

Example: Reading all lines from a file and printing them.

file_path = "example.txt"
with open(file_path, 'r') as file:
lines = file.readlines()
for line in lines:
print(line.strip()) # strip() removes the newline character at the end of each line

Writing to a File

To write data to a file, you need to open it in write mode using 'w' as the mode argument. When using write mode, if the file already exists, it will be truncated (emptied). If the file doesn't exist, a new one will be created.

You can write data to the file using the write() method of the file object.

Example: Writing data to a file.

file_path = "output.txt"
data = "Hello, this is some data that will be written to the file."

with open(file_path, 'w') as file:
file.write(data)

Appending to a File

If you want to add data to the end of an existing file without overwriting its content, you can open the file in append mode using 'a' as the mode argument.

You can append data to the file using the write() method as well.

Example: Appending data to a file.

file_path = "existing_file.txt"
data_to_append = "This data will be appended to the end of the file."

with open(file_path, 'a') as file:
file.write(data_to_append)

Closing a File

When you are done working with a file, it is good practice to close it. Although Python automatically closes the file when it leaves the with block (using context managers), explicitly closing the file is recommended, especially when you are not using a with block.

Example: Explicitly closing a file.

file_path = "example.txt"
file = open(file_path, 'r')
content = file.read()
file.close()

Handling File Exceptions

When working with files, there might be scenarios where errors occur. It is essential to handle these errors gracefully. Common file-related exceptions include FileNotFoundError (when the file does not exist), PermissionError (when the file permissions restrict access), and IOError (for other I/O-related errors).

To handle these exceptions, you can use try and except blocks.

Example: Handling file-related exceptions.

file_path = "nonexistent_file.txt"
try:
with open(file_path, 'r') as file:
content = file.read()
print(content)
except FileNotFoundError:
print("File not found. Please check the file path.")
except IOError as e:
print("An error occurred while reading the file:", str(e))

Best Practices

  • Always use with open(...) when working with files. It ensures the file is automatically closed when you're done with it.
  • Use meaningful variable names for file paths and file objects to improve code readability.
  • Check if the file exists (using os.path.exists()) before attempting to open it to avoid exceptions.
  • Always handle file exceptions to avoid program crashes and provide helpful error messages to the user.
  • When dealing with text files, consider encoding (e.g., 'utf-8') to handle special characters correctly.

Working with files is an essential skill for any programmer, and mastering these concepts will allow beginning students to handle file-related tasks effectively and efficiently in Python.



Manage Files

Python provides several built-in modules for managing file system functions, making it convenient for beginning programming students to perform various file operations. Below is an overview of some important file system-related libraries in Python:

os module:

The os module is a fundamental part of Python's standard library and provides a wide range of functions for interacting with the operating system, including file system operations. It's the most commonly used module for basic file handling tasks.
Key functions for file system operations in the os module:

  • os.path.exists(path): Checks if a file or directory exists at the specified path.
  • os.path.isfile(path): Checks if the given path points to a regular file.
  • os.path.isdir(path): Checks if the given path points to a directory.
  • os.remove(path): Deletes a file at the specified path.
  • os.mkdir(path): Creates a directory at the specified path.
  • os.makedirs(path): Creates directories recursively along the specified path.
  • os.rename(src, dst): Renames a file or directory from src to dst.
  • os.listdir(path): Returns a list of all files and directories in the specified directory.

Example:

import os

# Checking if a file exists
if os.path.exists('example.txt'):
print("The file exists.")
else:
print("The file does not exist.")

# Deleting a file
os.remove('example.txt')

# Creating a directory
os.mkdir('my_directory')

# Renaming a file
os.rename('old_name.txt', 'new_name.txt')

# Listing files in a directory
files = os.listdir('.')
print(files)

shutil module:

The shutil module provides higher-level file operations and additional functionalities compared to the os module. It is particularly useful for file copying, moving, and archiving.

Key functions in the shutil module:

  • shutil.copy(src, dst): Copies a file from src to dst.
  • shutil.copy2(src, dst): Copies a file, preserving metadata (timestamps, permissions, etc.).
  • shutil.copytree(src, dst): Recursively copies a directory and its contents to dst.
  • shutil.move(src, dst): Moves a file or directory from src to dst.
  • shutil.rmtree(path): Recursively deletes a directory and its contents.

Example:

import shutil

# Copying a file
shutil.copy('source_file.txt', 'destination_file.txt')

# Moving a file
shutil.move('old_location.txt', 'new_location.txt')

# Copying a directory
shutil.copytree('source_directory', 'destination_directory')

# Deleting a directory and its contents
shutil.rmtree('directory_to_delete')

glob module:

The glob module is used for file pattern matching, enabling you to retrieve lists of files based on wildcard patterns.

Key function in the glob module:

  • glob.glob(pattern): Returns a list of file paths that match the specified pattern.

Example:

import glob

# Get a list of all text files in the current directory
txt_files = glob.glob('*.txt')
print(txt_files)

These built-in libraries make it straightforward for beginning programming students to manage file system functions in Python. Students can use them to perform common file operations like file existence checks, file copying, moving, renaming, and directory creation and deletion. Encourage students to experiment with these functions to develop a better understanding of file system operations and how Python can be used to manage files effectively.



Serializing Objects

The word serialize refers to a process of turning a Python Object (a list, variable, or almost anything that can be assigned to a variable) into a string that can be saved to a simple text file.  In the next two sections, we will examine two popular Python Libraries that are built for the purpose of serializing objects.



Using the Pickle Library

The pickle library in Python is used for serializing and deserializing Python objects. Serialization is the process of converting objects in memory into a format that can be easily stored, transmitted, or shared. Deserialization, on the other hand, is the process of reconstructing the original Python objects from the serialized data. The pickle module allows you to save complex data structures, such as lists, dictionaries, classes, and custom objects, into a binary format.

The primary functions in the pickle module are pickle.dump() and pickle.load(). pickle.dump() is used to serialize Python objects and write them to a file, while pickle.load() reads the serialized data from a file and reconstructs the original objects.

Usage of pickle.dump()

The pickle.dump() function serializes the Python object and writes it to a file.

Syntax:

import pickle

with open('file_name.pkl', 'wb') as file:
pickle.dump(object_to_serialize, file)

Explanation:

file_name.pkl: The name of the file where the serialized data will be saved. The extension .pkl is commonly used for pickle files, but you can use any file extension you prefer.

object_to_serialize: The Python object you want to serialize and store.

Example:

import pickle

data = {
'name': 'John',
'age': 30,
'email': 'john@example.com'
}

with open('data.pkl', 'wb') as file:
pickle.dump(data, file)

In this example, we have a dictionary called data, and we serialize and save it in the file named data.pkl.

Usage of pickle.load()

The pickle.load() function reads the serialized data from a file and reconstructs the original Python object.

Syntax:

import pickle

with open('file_name.pkl', 'rb') as file:
loaded_object = pickle.load(file)

Explanation:

file_name.pkl: The name of the file from which the serialized data will be read.
loaded_object: The Python object that will be reconstructed from the serialized data.

Example:

import pickle

with open('data.pkl', 'rb') as file:
loaded_data = pickle.load(file)

print(loaded_data)

In this example, we read the serialized data from the file data.pkl and load it back into the variable loaded_data.

Use Cases and Considerations

The pickle library is useful when you need to save and load complex data structures, especially when working with machine learning models, custom objects, or large datasets. However, there are a few important considerations to keep in mind:

  • Security: The pickle module can execute arbitrary code when loading data, making it potentially unsafe when dealing with untrusted sources. Avoid loading pickled data from untrusted or unreliable sources.
  • Compatibility: Pickle files created with one version of Python may not be compatible with different versions. Always try to use the same version of Python for both pickling and unpickling.
  • Human-readable format: Pickle files are in binary format and not human-readable. If human readability is important, consider using other serialization formats like JSON or YAML.
  • Versioning: Be cautious with changes to the objects you are pickling. If you change the structure of an object after pickling it, you may encounter issues when loading the pickled data.
  • Alternatives: For more human-readable and cross-platform options, consider using JSON (json module) or YAML (pyyaml library) for serialization.

Overall, pickle is a powerful library for serializing Python objects, but it should be used with care and awareness of its limitations and potential security risks. When used appropriately, it can simplify the process of saving and loading complex data structures in Python.



Using the JSON Library

The JSON (JavaScript Object Notation) library in Python provides functions for serializing and deserializing data in a human-readable and platform-independent format. JSON is commonly used for data interchange between applications and is widely supported across various programming languages.

In Python, the JSON library is part of the standard library, so you don't need to install anything separately to use it.

Serialization using json.dump()

The json.dump() function serializes Python objects and writes them to a file in JSON format.

Syntax:

import json

with open('file_name.json', 'w') as file:
json.dump(object_to_serialize, file)

Explanation:

  • file_name.json: The name of the file where the JSON data will be saved.
  • object_to_serialize: The Python object you want to serialize and store.

Example:

import json

data = {
'name': 'John',
'age': 30,
'email': 'john@example.com'
}

with open('data.json', 'w') as file:
json.dump(data, file)

In this example, we have a dictionary called data, and we serialize and save it in the file named data.json in JSON format.

Deserialization using json.load()

The json.load() function reads JSON data from a file and parses it into Python objects.

Syntax:

import json

with open('file_name.json', 'r') as file:
loaded_object = json.load(file)

Explanation:

  • file_name.json: The name of the file from which the JSON data will be read.
  • loaded_object: The Python object that will be reconstructed from the JSON data.

Example:

import json

with open('data.json', 'r') as file:
loaded_data = json.load(file)

print(loaded_data)

In this example, we read the JSON data from the file data.json and load it back into the variable loaded_data.

Use Cases and Considerations

The JSON library is widely used due to its human-readable and cross-platform compatibility. Some considerations when using JSON for serialization are:

  • Human-readable format: JSON data is easy for humans to read and understand, which can be beneficial when you need to inspect or edit the serialized data manually.
  • Data interchange: JSON is commonly used for communication between applications, especially in web services and APIs.
  • Simplicity: JSON is well-suited for simple data structures like dictionaries and lists. However, it has limited support for more complex Python objects (e.g., custom classes with methods).
  • Platform independence: JSON is supported across multiple programming languages, making it a good choice for interoperability between different systems.
  • Limitations: JSON has limitations when serializing certain Python data types, such as tuples or sets. These will be converted to lists during serialization.

Example:

# Serializing a list of dictionaries and writing to a JSON file
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Charlie', 'age': 22}
]

with open('data.json', 'w') as file:
json.dump(data, file)

# Deserializing the JSON data and printing the loaded list
with open('data.json', 'r') as file:
loaded_data = json.load(file)

print(loaded_data)

In this example, we serialize a list of dictionaries containing people's information to a JSON file, and then we read the JSON data from the file and load it back into the loaded_data variable.

JSON is a versatile and widely used format for data serialization in Python and beyond. It provides a simple and readable way to store and exchange data between applications.

Videos for Module 12 - Working With Files

12-1: Working With Files in Python (2:26)

12-2: Opening a File in Python (7:42)

12-3: Reading a File in Python (5:50)

12-4: Writing to a File in Python (3:23)

12-5: Closing a File (2:54)

12-6: Removing and Verifying Files in Python (3:32)

12-7: Sandbox 12 Explanation (4:06)

12-8: A12 Introduction (1:45)

12-9: Challenge 1 - Web Spidering Application (11:45)

12-10: Challenge 2 - Saving Data Using Pickle (14:51)

Key Terms for Module 12 - Working With Files

No terms have been published for this module.

Quiz Yourself - Module 12 - Working With Files

Test your knowledge of this module by choosing options below. You can keep trying until you get the right answer.

Skip to the Next Question 

Activities for this Module

S12 - Draw From a File

Note: Sandbox assignments are designed to be formative activities that are somewhat open-ended. To get the most value, spend some time playing around as you code.

Using files with our Python scripts gives us some abilities that we don’t have without them. Saving data after a program quits running so we can go back to it is a big advantage in computing, because so many things that we do require that “data persistence” over time. Another important thing that files can do for us is to supply data that our scripts can use as they run.

This week, we will use external files, along with a script that can take that data, understand it, and then draw something on the screen using the Python Turtle Module. I’ve provided the script code below.  Copy and paste it into a new Python file.

import turtle

def execute_turtle_commands(filename):
    t = turtle.Turtle()
    screen = turtle.Screen()

    try:
        with open(filename, 'r') as file:
            for line in file:
                # Split each line into command and value
                parts = line.strip().split()
                command = parts[0]
                value = int(parts[1])

                # Execute the command
                if command == "fd":
                    t.forward(value)
                elif command == "rt":
                    t.right(value)
                elif command == "lt":
                    t.left(value)
                else:
                    print(f"Unknown command: {command}")
    except FileNotFoundError:
        print("File not found. Please check the file path and try again.")
    except Exception as e:
        print(f"An error occurred: {e}")

    # Click on screen to close the window
    screen.exitonclick()

# Example usage
execute_turtle_commands("rectangle.txt")

Before you run this code, there is one more thing to do. What you might have noticed is that the script is looking for a file called “rectangle.txt”.  Create a file called rectangle.txt in your main project folder, and then copy and paste the lines below into it:

fd 100
rt 90
fd 50
rt 90
fd 100
rt 90
fd 50

Now, run the Python script that you made in the earlier step. It should make a rectangle.

Can you make a file called “triangle.txt” that will make a triangle?

Perhaps you are starting to see how something like this, even with it being so simple, could be very useful and powerful. Essentially, you could draw any shape without changing the code itself, by just changing the very simple text file.

For this sandbox challenge, your goal is to modify the script to add additional capabilities.  It can be anything you want. For example, drawing a circle of a specific radius, changing the line thickness or color. 

Once you have made your modifications to the script, add appropriate commands to your text file to show it off.

A12a - Web Crawler

The Challenge

Challenge 1: Build a web crawler application that will scan web pages for links, and follow them, scanning for more links until the application doesn’t find any new web pages on the web site.  The program should output a list of the pages with some indication of how they are connected to each other (I suggest an outline format).  The code below can be used to access a web site and download the HTML code for a page.

Resources

Download the sample code for using the URLLib module

Constraints / Success Criteria

  • Must crawl the entire site
  • Must not go beyond the site
  • Must print out a list of pages crawled.
  • All code must be commented.
  • Use only what we have covered in this class up to this point

Submission Type

Please submit the complete program as a .py file.

A12b - Using Pickle to Save Objects

The Challenge

You may choose one of the two following challenges to complete this assignment:

Enhance the employee database assignment so that the database data is stored in a file, and that file is loaded when the program starts, and updated each time there is a change to the data.  I would suggest using the Pickle module to save your database (your dictionary) to a file, and calling that function whenever a change is made to the data. Another function should load the data using Pickle and return the dictionary.

Resources

Constraints / Success Criteria

  • Must save data to a text file.
  • Must load data from the text file upon startup.
  • All code must be commented.
  • Use only what we have covered in this class up to this point

Submission Type

Please submit the complete program as a .py file.