2. clip 方法实现

代码示例

import clip
import torch
from PIL import Image

# 加载模型
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# 准备图像和文本
image = preprocess(Image.open("path_to_image.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["a description of what you are looking for"]).to(device)

# 计算特征并比较
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

    # 计算相似度
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Match probability:", probs)

python.clip 模块方法

model

如上面的案例, model 导入的是来自 ViT-B/32 的模型

preprocess

用于将输入的图像数据转换为模型期望的格式。这通常包括调整图像大小、归一化和颜色标准化等步骤
• 图像处理：这个函数接受一个 PIL 图像对象，将其转换为模型可以处理的张量格式。它会确保图像的尺寸、颜色通道等符合模型的输入要求。
• 输出：preprocess 函数的输出是一个已经被转换成适合网络输入的 PyTorch 张量 (Tensor)