๐Ÿš€ A Deep Dive into Screen Recording on macOS with ChatGPT and ffmpeg!

Ever wondered how to capture your screen activity on a Mac? Well, I had a delightful chat with ChatGPT that turned into a mini-masterclass on using ffmpeg for screen recording. Here’s a fun recap of our conversation and what I learned along the way!

๐Ÿคฏ Quick Summary: Chatting with ChatGPT

Me: Hey, how can I record my screen on macOS?

ChatGPT: You can use ffmpeg! It’s a robust command-line tool perfect for recording and converting multimedia. Try this command:

ffmpeg -f avfoundation -i "1" output.mp4

Make sure to adjust the -i parameter to match your screen. This method is not only straightforward but also highly customizable.

๐ŸŽฌ Insights Gained from ffmpeg

Embarking on a journey of continuous screen recording, I quickly recognized the potential strain on my battery and the looming threat to my disk space. Fortunately, ffmpeg emerged as a hero, equipped with a suite of clever techniques to mitigate these concerns.

๐ŸŽฅ Frame Rate Magic

Inspired by Rewind.ai, I decided to capture one frame every two seconds to save on resources. Here’s how you can do it:

ffmpeg -f avfoundation -framerate 0.5 -i "1" output.mp4

The -framerate flag here is your golden ticket to controlling the input frame rate. Lowering the frame rate means fewer frames to process and store, which significantly reduces the load on your system. it sets drops the framerate as much as the device avfoundation allows. So for me setting this to 0.5, set it to 4 FPS.

๐Ÿ“‰ Choosing the Right Resolution

To further reduce the load, I opted for a lower resolution. Choose 1080p as this is good default and high-res enough that retina display when scaled down to this, the text is still legible. Here’s the command:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -s hd1080 output.mp4

I faced a decision between using -s for immediate scaling or -vf for filter-based scaling during processing. I chose -s for its straightforward approach to reduce the processing load. The -s option scales the video before encoding, offering a less CPU-intensive route compared to -vf, which applies filters during the encoding process. However, I later transitioned back to using -vf for its advanced filtering capabilities.

๐Ÿš€ Turbo-Charged Encoding with Hardware Acceleration

Since I’m using a Mac with Apple Silicon, I leveraged built-in hardware acceleration for encoding to the h.265 (HEVC) format, which offers better compression and quality for the same file size compared to h.264:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -s hd1080 -c:v hevc_videotoolbox output.mp4
  • hevc_videotoolbox: Utilizes Apple’s hardware acceleration for efficient encoding.

Following Setting only work for Software Encoding and not for hardware accelerated encoding. So could not use them.

  • crf (Constant Rate Factor): Balances quality and compression, with values ranging from 18 (near lossless) to 50 (lower quality). I opted for 28 as a middle ground for decent quality without excessive file size.
  • preset: Influences the encoding speed and efficiency. Options include medium, slow, and very slow. I selected slow to optimize file size without overly taxing the processor. While very slow achieves the smallest file size, it significantly increases processing demands and battery usage.

๐Ÿ”„ Handling Variable Frame Rate

While dynamic frame rate adjustments are great, they don’t mesh well with hardware acceleration. So, I stuck with raw encoding, which isn’t perfect but gets the job done. This was the only way to setup the required presets and variable framerate:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -s hd1080 -c:v hevc_videotoolbox -fps_mode vfr -b:v 2K -maxrate 2K -bufsize 8K output.mp4

I could choose a very low framerate, as mostly i wanted to concentrate on static details and text. This kept the file decently low.

๐Ÿ› ๏ธ Alternative: Software Encoding with libx265

For those situations where hardware acceleration conflicts with the need for dynamic frame rate adjustments, switching to software encoding with libx265 offers a viable alternative. This approach allows for more flexibility in handling variable frame rates:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -s hd1080 -c:v libx265 -crf 28 -x265-params "keyint=120:min-keyint=20:no-scenecut" -preset slow output.mp4
  • libx265: Engages the x265 encoder, a software-based HEVC encoder that provides fine-grained control over the encoding process.
  • x265-params: Allows for detailed configuration of the encoder. The keyint and min-keyint parameters control the maximum and minimum interval between IDR-frames, optimizing for variable frame rate content. The no-scenecut option disables scene cut detection, which can be beneficial for maintaining a consistent encoding strategy across varying content.

This software-based approach, while more CPU-intensive, grants the flexibility needed for precise control over video quality and file size, making it an excellent choice for projects requiring dynamic frame rate adjustments.

๐Ÿ› ๏ธ Tweaking the Defaults

Setting a keyframe interval is crucial for maintaining detailed captures, especially over long periods. I set mine to every 5 minutes. The faststart flag ensures that video metadata is positioned at the beginning for quick playback startup:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -s hd1080 -c:v hevc_videotoolbox -g 150 -fps_mode vfr -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4

๐ŸŒ“ Going Grayscale for Extra Savings

Since I mostly code in a light theme, converting the video to grayscale seemed like a smart move to further reduce file size:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -vf "format=gray,scale=-1:1080" -c:v hevc_videotoolbox -g 150 -fps_mode vfr -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4
  1. Reduced File Size: By eliminating color information, the video file size is significantly reduced. This is particularly beneficial for long recording sessions where storage space can become a concern.
  2. Resource Efficiency: Grayscale conversion is less resource-intensive compared to other color adjustments. This means less CPU and memory usage during the encoding process.
  3. Visual Clarity: For coding sessions, where the primary focus is on text and static details, grayscale is sufficient. It ensures that the essential details remain clear and visible without the need for full-color video.

Combining Scaling with Grayscale Conversion

By combining scaling with grayscale conversion in the filtering stage, we streamline the encoding process, making it more efficient. Here’s how:

  • Scaling: Reducing the resolution of the video to 1080p ensures that the file size is manageable while maintaining sufficient detail for readability. This is particularly important for screen recordings where text clarity is crucial.
  • Grayscale Conversion: The format=gray filter simplifies the video by removing color information, which not only reduces the file size but also speeds up the encoding process.

Code Commentary

  • -vf "format=gray,scale=-1:1080": This part of the command applies the grayscale filter and scales the video to 1080p.
  • -c:v hevc_videotoolbox: Utilizes Apple’s hardware acceleration for efficient encoding.
  • -b:v 2K -maxrate 2K -bufsize 8K: These parameters control the bitrate and buffer size, ensuring a balance between quality and file size.
  • -movflags +faststart: Ensures that the video metadata is positioned at the beginning of the file for quick playback startup.

By integrating these techniques, we achieve a highly efficient and resource-friendly screen recording setup that meets the specific needs of coding sessions. This approach not only saves on disk space but also ensures that the essential details remain clear and visible without the need for full-color video.

๐Ÿ› ๏ธ Optimizing Output Frame Rate

The -r flag is typically used to control the output frame rate, but it can result in a high amount of data being processed. To manage this more efficiently, we need to focus on the filter stage, which is controlled by the -vf (video filter) option.

By using the -vf option, we can precisely control the frame rate during the filtering stage, ensuring that only the necessary frames are processed. This approach reduces the load on the system and optimizes the overall encoding process.

Here’s the command that achieves this:

ffmpeg -f avfoundation -i "1" -framerate 0.5 -vf "fps=0.5,format=gray,scale=-1:1080" -fps_mode vfr -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4

Code Commentary

  • -vf "fps=0.5,format=gray,scale=-1:1080": This part of the command applies multiple filters:
    • fps=0.5: Captures one frame every two seconds, significantly reducing the amount of data processed.
    • format=gray: Converts the video to grayscale, reducing file size and processing load.
    • scale=-1:1080: Scales the video to 1080p, ensuring a balance between quality and file size.
  • -fps_mode vfr: Enables variable frame rate mode, which is more efficient for screen recordings with varying content.
  • -c:v hevc_videotoolbox: Utilizes Apple’s hardware acceleration for efficient encoding.
  • -b:v 2K -maxrate 2K -bufsize 8K: These parameters control the bitrate and buffer size, ensuring a balance between quality and file size.
  • -movflags +faststart: Ensures that the video metadata is positioned at the beginning of the file for quick playback startup.

By implementing these optimizations, we can achieve a more efficient and resource-friendly screen recording setup. This approach not only reduces the load on the system but also ensures that the essential details remain clear and visible, making it ideal for coding sessions and other screen recording tasks.

๐Ÿ”ง Enhancing Input Flow for Hardware Decoding

To optimize the input flow and leverage hardware decoding, we made some crucial adjustments to the ffmpeg command:

ffmpeg -f avfoundation -framerate 0.5 -pixel_format nv12 -i "2:none" -vf "fps=0.5,format=gray,scale=-1:1080" -fps_mode vfr -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4
  • Key Adjustments

    1. Pixel Format:
    • -pixel_format nv12: We chose nv12 because it is compatible with hardware acceleration. This format ensures that the video processing is optimized for performance, reducing the load on the CPU and taking advantage of the GPU.
    1. Input Source:
    • -i "2:none": This specifies that the second device is used for video capture, and no audio is captured. This is particularly useful for screen recording where audio is not required, further reducing the processing load.
  • Listing Available Devices To identify the correct input device, you can list all available devices using the following command:

ffmpeg -f avfoundation -list_devices true -i ""

This command will display a list of all video and audio capture devices connected to your system, allowing you to select the appropriate device for your recording needs.

๐Ÿ“ˆ Optimizing the Output Stream

ffmpeg -f avfoundation -framerate 0.5 -pixel_format nv12 -i "2:none" -vf "fps=0.5,format=gray,scale=-1:1080" -fps_mode -an vfr -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4

Audio Handling:

  • -an: Ensures that no audio track is included in the output. This is useful for screen recordings where audio is not needed, further reducing the file size and processing requirements.

๐ŸŽฅ Enhancing Video Quality

To further improve the video quality, we made several key adjustments to the ffmpeg command:

ffmpeg -f avfoundation -framerate 0.5 -pixel_format nv12 -i "2:none" -vf "fps=0.5,format=gray,scale=-1:1080,setrange=full,unsharp=5:5:1.3,format=yuv420p" -fps_mode -an vfr -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +faststart output.mp4
  • Key Adjustments and Their Importance The changes and the order in the -vf (video filter) flag are crucial. The order defines the linear execution flow, ensuring that each step is processed correctly and efficiently. Here’s a detailed breakdown:
  1. Frame Rate Control:
    • fps=0.5: This ensures that only one frame every two seconds is processed, regardless of the input device’s frame rate. This significantly reduces the amount of data processed, optimizing the workflow.
  2. Grayscale Conversion:
    • format=gray: Converts the video to grayscale, removing extra color channels. This step reduces the file size and processing load, making the encoding process more efficient.
  3. Scaling:
    • scale=-1:1080: Scales the video to 1080p, ensuring a balance between quality and file size. This is particularly important for screen recordings where text clarity is crucial.
  4. Color Range Adjustment:
    • setrange=full: This step is a prerequisite for the unsharp filter. It ensures that the full color range is used, which is necessary for the subsequent sharpening process.
  5. Sharpening:
    • unsharp=5:5:1.3: This filter improves text legibility by sharpening the video. This is particularly useful for screen recordings where clear text is essential.
  6. Final Format Conversion:
    • format=yuv420p: Converts the video to a format supported by the hardware encoder. This step ensures compatibility and efficient encoding.
  • Importance of Filter Order The order of the filters in the -vf flag is critical because each filter processes the output of the previous one. This linear execution flow ensures that each step is applied correctly and efficiently:
    • First, the frame rate is reduced to ensure that only the necessary frames are processed.
    • Next, the video is converted to grayscale, simplifying the data and reducing the file size.
    • Then, the video is scaled to the desired resolution, ensuring clarity and manageability.
    • After that, the color range is adjusted to prepare for sharpening.
    • Subsequently, the sharpening filter is applied to enhance text legibility.
    • Finally, the video is converted to a format supported by the hardware encoder, ensuring efficient encoding.

๐Ÿ›ก๏ธ Making the Recording Resilient to Interruptions

To ensure that our screen recording process is resilient to interruptions, such as system crashes or unexpected shutdowns, we made several key adjustments to the ffmpeg command:

ffmpeg -f avfoundation -framerate 0.5 -pixel_format nv12 -i "2:none" -vf "fps=0.5,format=gray,scale=-1:1080,setrange=full,unsharp=5:5:1.3,format=yuv420p" -fps_mode vfr -an -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +frag_keyframe+empty_moov+default_base_moof -g 150 -f segment -segment_time 3600 -reset_timestamps 1 -strftime 1 "%Y-%m-%d_%H-%M-%S_%A_%B_%d.mp4"
  1. MOV Flags:
  • -movflags +frag_keyframe+empty_moov+default_base_moof: These flags are crucial for making the recording process more resilient:
    • +frag_keyframe: Creates a fragment at each keyframe, ensuring that the video can be played back even if the recording is interrupted.
    • +empty_moov: Writes an empty moov atom at the beginning of the file for compatibility, which is essential for fragmented MP4 files.
    • +default_base_moof: Uses the default base data offset for movie fragments, ensuring proper playback in fragmented MP4 files. These flags help create internal segments at the start of each keyframe. If ffmpeg gets interrupted, only the last segment from the last keyframe to the time of interruption will get corrupted. This means a maximum of 5 minutes of recording will be lost, which is an acceptable trade-off for most use cases.
  1. Segmenting the Recording:
  • -f segment -segment_time 3600 -reset_timestamps 1 -strftime 1 "%Y-%m-%d_%H-%M-%S_%A_%B_%d.mp4": This option segments the recording into new files every hour. This approach ensures that if an interruption or corruption occurs, it will only affect the last file, minimizing data loss.

โš™๏ธ My Setup

As an avid Alfred user, I prefer to streamline my workflow rather than manually setting up and controlling the recording process via the terminal each time. To achieve this, I created a custom Alfred workflow with three essential commands.

๐Ÿš€ Start Recording

#!/bin/bash

query=$1

# Function to convert a string to camel case and make it safe for file names
to_camel_case() {
    local input="$1"
    # Replace non-alphanumeric characters with spaces
    input=$(echo "$input" | sed -r 's/[^a-zA-Z0-9]+/ /g')
    # Convert to lowercase
    input=$(echo "$input" | tr '[:upper:]' '[:lower:]')
    # Capitalize the first letter of each word
    input=$(echo "$input" | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))} 1')
    # Remove spaces to form a camel case string
    input=$(echo "$input" | tr -d ' ')
    # Remove any remaining special characters to make it safe for file names
    input=$(echo "$input" | sed -r 's/[^a-zA-Z0-9]//g')
    echo "$input"
}

# Check if query is not empty
if [ ! -z "$query" ]; then
    camel_case_query=$(to_camel_case "$query")
    file_postfix="_${camel_case_query}"
else
    file_postfix=""
fi

# Directory to store PID file and recordings
WORKFLOW_DATA_DIR="$alfred_workflow_data"
PID_FILE="$WORKFLOW_DATA_DIR/ffmpeg.pid"
RECORDING_DIR="$WORKFLOW_DATA_DIR/recording"
LOG_FILE="$WORKFLOW_DATA_DIR/start_ffmpeg.log"

# Create the recording directory if it doesn't exist
mkdir -p "$RECORDING_DIR"

# Clear the log file
echo "" > "$LOG_FILE"

# Log a message
echo "Starting ffmpeg script..." >> "$LOG_FILE"

# Check if the PID file exists and if the process is running
if [ -f "$PID_FILE" ] && kill -0 $(cat "$PID_FILE") 2>/dev/null; then
    echo "ffmpeg is already running." >> "$LOG_FILE"
    echo "ffmpeg is already running."
    exit 1  # Return code 1 indicates ffmpeg is already running
fi

# Start the ffmpeg process in the background using nohup and save the PID
nohup ffmpeg -f avfoundation -framerate 0.5 -pixel_format nv12 -i "2:none" -vf "fps=0.5,format=gray,scale=-1:1080,setrange=full,unsharp=5:5:1.3,format=yuv420p" -fps_mode vfr -an -c:v hevc_videotoolbox -b:v 2K -maxrate 2K -bufsize 8K -movflags +frag_keyframe+empty_moov+default_base_moof -g 150 -f segment -segment_time 3600 -reset_timestamps 1 -strftime 1 "$RECORDING_DIR/%Y-%m-%d_%H-%M-%S_%A_%B_%d${file_prefix}.mp4" > /dev/null 2>&1 &

FFMPEG_PID=$!
echo $FFMPEG_PID > "$PID_FILE"

echo "ffmpeg started with PID $FFMPEG_PID." >> "$LOG_FILE"
echo "ffmpeg started with PID $FFMPEG_PID."

exit 0  # Return code 0 indicates success

Code Commentary

  1. nohup: Ensures the ffmpeg process ignores hangup signals and continues running even after the script exits.
  2. Output Redirection: > /dev/null 2>&1 ensures that the output of the ffmpeg process does not interfere with the script and is discarded.
  3. Background Process: The & at the end of the command starts the command as a background process.
  4. Save PID: The PID of the ffmpeg process is captured using $! and saved to a file.
  5. Logging: The script logs messages to the log file and outputs the PID to Alfred.

๐Ÿ›‘ Stop Recording

#!/bin/bash

# Directory to store PID file
WORKFLOW_DATA_DIR="$alfred_workflow_data"
PID_FILE="$WORKFLOW_DATA_DIR/ffmpeg.pid"

# Check if the PID file exists and if the process is running
if [ -f "$PID_FILE" ]; then
    FFMPEG_PID=$(cat "$PID_FILE")
    if kill -0 $FFMPEG_PID 2>/dev/null; then
        kill -SIGINT $FFMPEG_PID
        echo "ffmpeg process $FFMPEG_PID stopped."
        rm "$PID_FILE"
        exit 0  # Return code 0 indicates success
    else
        echo "ffmpeg process not running, cleaning up."
        rm "$PID_FILE"
        exit 1  # Return code 1 indicates the process was not running
    fi
else
    echo "No ffmpeg process running."
    exit 1  # Return code 1 indicates no PID file found
fi

๐Ÿ“‚ Open Recording Directory

#!/bin/bash

# Directory to open in Finder
RECORDING_DIR="$alfred_workflow_data/recording"

# Create the recording directory if it doesn't exist
mkdir -p "$RECORDING_DIR"

# Open the recording directory in Finder
open "$RECORDING_DIR"

By integrating these commands into an Alfred workflow, I can easily start, stop, and access my screen recordings without having to manually enter commands in the terminal. This setup not only saves time but also ensures a seamless recording experience.

๐ŸŒŸ Reflecting on the Journey

Embarking on this adventure with ffmpeg and ChatGPT has not only addressed my screen recording requirements but also opened up a whole new world of video processing insights. ffmpeg stands out as an incredibly versatile toolkit for anyone interested in multimedia, offering a wide range of possibilities for both novices and experts. It’s an invitation to explore, experiment, and enhance your skills.

Key Takeaways

  1. Optimizing Screen Recording:

    • By leveraging ffmpeg’s powerful command-line capabilities, we optimized screen recording on macOS, focusing on efficiency and quality. Techniques such as grayscale conversion, frame rate control, hardware decoding, and output stream optimization were crucial in achieving a resource-friendly setup.
  2. Enhancing Video Quality:

    • We implemented advanced filters and settings to ensure high-quality recordings. The importance of filter order was highlighted, ensuring each step in the video processing pipeline was executed correctly and efficiently.
  3. Resilience to Interruptions:

    • By adding specific MOV flags and segmenting the recording, we made the recording process resilient to interruptions. This approach minimizes data loss and ensures the majority of the recording remains intact.
  4. Streamlined Workflow with Alfred:

    • As an avid Alfred user, I created a custom workflow to start, stop, and access recordings seamlessly. This setup saves time and ensures a smooth recording experience without manual terminal commands.

Future Plans

This experience has also solidified my intention to integrate GPT as a collaborative programming companion. In pursuit of this, I plan to leverage the following tools:

  • Cursor from Cursor.so: As an enthusiastic Visual Studio Code user, I find Cursor’s addition of simple, GPT-friendly interfaces to be a game-changer, making it easier to interact with GPT during coding sessions.
  • ChatGPT 4.0 and GPT-4: These advanced iterations of GPT promise to revolutionize the way I approach problem-solving and coding, offering unparalleled assistance and insights.

Final Thoughts

This journey has been a testament to the power of combining cutting-edge tools and technologies to push the boundaries of what’s possible in programming and video editing. By integrating ffmpeg’s robust capabilities with the advanced assistance of ChatGPT, we have created a highly efficient and effective screen recording setup tailored for coding sessions and beyond.

The exploration and experimentation with these tools have not only enhanced my technical skills but also opened up new avenues for creative and efficient workflows. I look forward to continuing this journey, leveraging the latest advancements in technology to further refine and optimize my processes.

Thank you for joining me on this deep dive into screen recording with ffmpeg and ChatGPT. I hope you found these insights and techniques as valuable and exciting as I did. Happy recording!