MacM1芯片系统基于大语言模型LLaMA部署本地版的ChatGPT

　　OpenAI公司基于GPT模型的ChatGPT风光无两，眼看它起朱楼，眼看它宴宾客，FaceBook终于坐不住了，发布了同样基于LLM的人工智能大语言模型LLaMA，号称包含70亿、130亿、330亿和650亿这4种参数规模的模型，参数是指神经网络中的权重和偏置等可调整的变量，用于训练和优化神经网络的性能，70亿意味着神经网络中有70亿个参数，由此类推。
　　在一些大型神经网络中，每个参数需要使用32位或64位浮点数进行存储，这意味着每个参数需要占用4字节或8字节的存储空间。因此，对于包含70亿个参数的神经网络，其存储空间将分别为8 GB或12GB。
　　此外，神经网络的大小不仅取决于参数的数量，还取决于神经元的数目，层数和其他结构参数等。因此，70亿的神经网络可能会占用更多的存储空间，具体取决于网络的结构和实现细节。
　　因此这种体量的模型单机跑绝对够我们喝一壶，所以本次使用最小的LLaMA 7B模型进行测试。 LLaMA项目安装和模型配置
　　和Stable-Diffusion项目如出一辙，FaceBook开源的LLaMA项目默认写死使用cuda模式，这也就意味着必须有 NVIDIA 的 GPU来训练和运行，不过好在大神GeorgiGerganov 用 C++ 基于 LLaMA 项目重写了一个跑在 CPU 上的移植版本 llama.cpp应用。
　　llama.cpp首先适配的就是苹果的M系列芯片，这对于果粉来说无疑是一个重大利好，首先通过命令拉取C++版本的LLaMA项目： git clone https://github.com/ggerganov/llama.cpp
　　随后进入项目目录： llama.cpp
　　在项目中，需要单独建立一个模型文件夹models: mkdir models
　　随后去huggingface官网下载LLaMA的7B模型文件：https://huggingface.co/nyanko7/LLaMA-7B/tree/main
　　是的，主模型文件已经达到了13.5gb之巨，如果本地硬盘空间告急，请谨慎下载。
　　随后在models目录建立模型子目录7B: mkdir 7B
　　将tokenizer.model和tokenizer_checklist.chk放入和7B平行的目录中： ➜  models git:(master) ✗ ls 7B                      tokenizer.model         tokenizer_checklist.chk
　　随后将checklist.chk consolidated.00.pth和params.json放入7B目录中： ➜  7B git:(master) ✗ ls checklist.chk       consolidated.00.pth  params.json
　　至此，模型就配置好了。 LLaMA模型转换
　　由于我们没有使用FaceBook的原版项目，所以它的模型还需要进行转换，也就是转换为当前C++版本的LLaMA可以运行的模型。
　　这里通过Python脚本进行转换操作： python3 convert-pth-to-ggml.py models/7B/ 1
　　第一个参数是模型所在目录，第二个参数为转换时使用的浮点类型，使用 float32，转换的结果文件会大一倍，当该参数值为 1时，则使用 float16 这个默认值，这里我们使用默认数据类型。
　　程序输出： ➜  llama.cpp git:(master) ✗ python convert-pth-to-ggml.py models/7B/ 1 {＂dim＂: 4096, ＂multiple_of＂: 256, ＂n_heads＂: 32, ＂n_layers＂: 32, ＂norm_eps＂: 1e-06, ＂vocab_size＂: -1} n_parts = 1  Processing part 0  Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16 Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16 Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.3.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.3.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.3.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.3.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.3.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.3.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.4.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.4.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.4.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.4.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.4.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.4.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.4.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.4.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.4.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.5.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.5.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.5.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.5.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.5.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.5.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.5.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.5.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.5.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.6.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.6.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.6.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.6.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.6.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.6.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.6.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.6.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.6.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.7.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.7.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.7.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.7.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.7.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.7.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.7.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.7.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.7.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.8.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.8.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.8.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.8.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.8.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.8.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.8.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.8.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.8.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.9.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.9.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.9.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.9.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.9.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.9.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.9.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.9.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.9.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.10.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.10.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.10.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.10.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.10.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.10.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.10.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.10.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.10.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.11.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.11.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.11.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.11.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.11.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.11.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.11.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.11.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.11.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.12.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.12.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.12.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.12.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.12.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.12.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.12.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.12.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.12.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.13.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.13.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.13.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.13.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.13.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.13.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.13.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.13.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.13.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.14.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.14.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.14.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.14.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.14.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.14.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.14.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.14.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.14.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.15.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.15.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.15.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.15.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.15.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.15.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.15.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.15.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.15.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.16.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.16.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.16.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.16.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.16.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.16.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.16.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.16.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.16.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.17.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.17.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.17.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.17.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.17.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.17.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.17.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.17.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.17.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.18.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.18.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.18.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.18.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.18.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.18.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.18.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.18.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.18.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.19.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.19.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.19.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.19.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.19.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.19.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.19.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.19.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.19.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.20.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.20.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.20.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.20.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.20.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.20.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.20.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.20.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.20.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.21.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.21.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.21.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.21.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.21.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.21.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.21.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.21.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.21.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.22.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.22.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.22.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.22.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.22.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.22.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.22.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.22.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.22.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.23.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.23.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.23.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.23.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.23.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.23.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.23.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.23.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.23.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.24.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.24.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.24.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.24.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.24.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.24.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.24.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.24.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.24.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.25.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.25.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.25.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.25.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.25.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.25.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.25.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.25.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.25.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.26.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.26.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.26.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.26.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.26.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.26.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.26.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.26.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.26.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.27.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.27.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.27.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.27.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.27.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.27.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.27.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.27.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.27.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.28.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.28.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.28.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.28.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.28.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.28.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.28.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.28.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.28.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.29.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.29.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.29.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.29.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.29.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.29.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.29.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.29.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.29.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.30.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.30.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.30.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.30.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.30.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.30.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.30.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.30.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.30.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.31.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.31.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.31.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.31.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16 Processing variable: layers.31.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.31.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16 Processing variable: layers.31.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16 Processing variable: layers.31.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Processing variable: layers.31.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16   Converting to float32 Done. Output file: models/7B//ggml-model-f16.bin, (part 0)
　　可以看到，如果转换成功，会在models/7B/目录生成一个C++可以调用的ggml-model-f16.bin模型文件。 LLaMA模型调用
　　接下来就可以调用转换后的模型了，首先在编译C++项目： make
　　程序返回： ➜  llama.cpp git:(master) ✗ make I llama.cpp build info:  I UNAME_S:  Darwin I UNAME_P:  arm I UNAME_M:  arm64 I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread I LDFLAGS:   -framework Accelerate I CC:       Apple clang version 14.0.0 (clang-1400.0.29.202) I CXX:      Apple clang version 14.0.0 (clang-1400.0.29.202)  cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread -c utils.cpp -o utils.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread main.cpp ggml.o utils.o -o main  -framework Accelerate ./main -h usage: ./main [options]  options:   -h, --help            show this help message and exit   -i, --interactive     run in interactive mode   -ins, --instruct      run in instruction mode (use with Alpaca models)   -r PROMPT, --reverse-prompt PROMPT                         in interactive mode, poll user input upon seeing PROMPT (can be                         specified more than once for multiple prompts).   --color               colorise output to distinguish prompt and user input from generations   -s SEED, --seed SEED  RNG seed (default: -1)   -t N, --threads N     number of threads to use during computation (default: 4)   -p PROMPT, --prompt PROMPT                         prompt to start generation with (default: empty)   --random-prompt       start with a randomized prompt.   -f FNAME, --file FNAME                         prompt file to start generation.   -n N, --n_predict N   number of tokens to predict (default: 128)   --top_k N             top-k sampling (default: 40)   --top_p N             top-p sampling (default: 0.9)   --repeat_last_n N     last n tokens to consider for penalize (default: 64)   --repeat_penalty N    penalize repeat sequence of tokens (default: 1.3)   -c N, --ctx_size N    size of the prompt context (default: 512)   --ignore-eos          ignore end of stream token and continue generating   --memory_f16          use f16 instead of f32 for memory key+value   --temp N              temperature (default: 0.8)   -b N, --batch_size N  batch size for prompt processing (default: 8)   -m FNAME, --model FNAME                         model path (default: models/llama-7B/ggml-model.bin)  c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize  -framework Accelerate
　　编译成功后，本地会生成一个main.cpp文件。
　　随后根据编译后输出的说明文档直接调用模型即可： ./main -m ./models/7B/ggml-model-f16.bin -p ＂Hi i am ＂
　　程序输出： ➜  llama.cpp git:(master) ✗ ./main -m ./models/7B/ggml-model-f16.bin -p ＂hi i am＂ main: seed = 1679400707 llama_model_load: loading model from ＂./models/7B/ggml-model-f16.bin＂ - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx   = 512 llama_model_load: n_embd  = 4096 llama_model_load: n_mult  = 256 llama_model_load: n_head  = 32 llama_model_load: n_layer = 32 llama_model_load: n_rot   = 128 llama_model_load: f16     = 1 llama_model_load: n_ff    = 11008 llama_model_load: n_parts = 1 llama_model_load: ggml ctx size = 13365.09 MB llama_model_load: memory_size =   512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from ＂./models/7B/ggml-model-f16.bin＂ llama_model_load: .................................... done llama_model_load: model size = 12853.02 MB / num tensors = 291  system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |   main: prompt: ＂ hi i am＂ main: number of tokens in prompt = 6      1 -> ＂＂  13450 -> ＂ hi＂    423 -> ＂i＂  25523 -> ＂ am＂  sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000    hi i am a pythoner, but sunk to become a ruby
　　说实话，推理速度实在不敢恭维，也可能是因为笔者的电脑配置太渣导致。 结语
　　LLaMA 7B模型总体上需要纯英文的提示词（prompt），对中文的理解能力还不够，优势是确实可以单机跑起来，当然本地跑的话，减少了网络传输数据的环节，推理效率自然也就更高，对于普通的AI爱好者来说，足矣。

樊纲看多楼市，住房迁移性需求大，仇保兴建议保护好房地产在房地产这一领域，已经很久没有听到专家的建议了。随着重新定位房地产为国民经济的支柱产业再次登场。各路专家纷纷就当前的房地产发展建言献策，一些新的词汇也频频诞生。比如迁移性需求房地产你还在吃油炸食品吗？来看看对我们的身体都有哪些影响！民以食为天我国对于吃是很有讲究的烹饪方式更是多种多样炒蒸焖炸其中油炸食品更是备受大家喜爱尤其是现在的年轻人和小孩子对于薯片炸鸡薯条几乎是无法抗拒的吃完这些油炸食品确实让人很有幸福感天然润喉宝找到了！阳了嗓子疼就吃它，一斤不到10块钱感染了新冠有什么症状，现在是越来越多人关心的话题，也是这两天搜索的热点，其实身边的很多朋友都已经陆陆续续阳过了。总结不同的症状大多数都会发烧，嗓子疼痛，而且咽部难受即便转阴，也会持如何缓解疲劳如何缓解疲劳第一调情绪，当一个人情绪波动很大的时候，就容易产生身心疲劳的状态。过度忧患悲伤或者欣喜的心境对身体其实是一种伤害，所以首先要避免情志的过度波动。第二调饮食，也就是保养脾火罐里的水珠是毒素吗？拔出水泡是好事吗？没想到真相是这样的你知道吗？火罐要拔好，能治好多种病，让你省下不少钱，比如在生活中，常见的腰腿痛，尤其是寒湿型腰腿痛患者十分适合拔火罐这种热疗法。中医曾用它与艾灸推拿口服腰痛宁胶囊等联合使用，治疗效詹姆斯数据更新28球队大胜北京时间2022年12月28日，湖人客场129110大胜魔术。湖人终止连败，浓眉继续伤缺。老詹本场比赛砍下28分7篮板5助攻1抢断1盖帽，虽然本场得分未能30但是赢球了，赛季场均出联合杯开启新赛季！小组赛3大看点纳达尔斯瓦泰克皆出战！北京时间12月29日，首届联合杯将在澳大利亚正式打响，这也宣告2023赛季正式开启！联合杯是一项由WTA和ATP共同呈现的混合团体赛，以国家队形式参赛，共计18支队伍参加，包括纳达丁彦雨航在巅峰期和恩师巩晓彬闹离队，现在早已后悔不已！说起丁彦雨航，真的是伤仲永。在自己职业生涯巅峰时期的时候，和恩师巩晓彬闹离队，他的好朋友兼经纪人睢冉这场闹剧中起到了推波助澜的作用，导致双方关系恶化。最后球队坚决不同意放人，小丁则逆袭！曾被卖到屠宰场的小马拿下15个冠军10岁小马太极洞原本是景区一匹普通打工马如今却已是马术赛场上的明星运动员别看身高不占优势但却已获得满满荣誉一起来了解它的传奇经历吧！从失业到命悬一线幸得伯乐相救太极洞属于广西德宝矮时隔23年！国安再度扮演争冠狙击手，若弃权，将坑惨死敌泰山中超33轮战罢，降级的三支球队全部产生，分别是广州队武汉长江与河北队。冠军还没有产生，目前中超还剩下一轮，山东泰山和武汉三镇积分相同，胜负关系都是平局，升班马三镇依靠着5个净胜球的31人参加！国乒开启教练竞聘，第一阶段淘汰3人，冲刺巴黎奥运会12月28日，中国乒协正式开启教练竞聘工作，经历了27日的交流会进行研讨之后，28日和29日进行国家队教练员的竞聘工作。本次国家队教练员竞聘设有6个方向共计24个岗位，针对巴黎奥运

<<<<<<－>>>>>>

13地市均设考点！2023国考笔试黑龙江考区可重选考点关于中央机关及其直属机构2023年度考试录用公务员笔试黑龙江考区增设考点城市及有关事项的公告中央机关及其直属机构2023年度考试录用公务员笔试定于2023年1月7日8日举行。为方便姆巴佩输球后神情落寞坐在地上阿根廷队门将上前握手安慰来源海外网法国队球员姆巴佩据英国每日镜报12月18日报道，在当天举行的卡塔尔世界杯决赛中，法国队在点球大战中败给阿根廷队，无缘冠军。赛后，法国队球员姆巴佩失望地坐在球场上，神情落寞如何快速击沉一艘军舰？头条创作挑战赛泰国海军发生一起极为罕见的军舰沉没事故！泰国媒体19日报道，当地时间12月18日23时30分左右，泰国海军一艘军舰在泰国巴蜀府附近海域因为遭遇巨浪不幸沉没。这艘军舰名84消毒液酒精有毒气体？用家用消毒剂，小心这些安全风险！酒精免洗洗手液84消毒液是大家常用的消毒用品但在居家使用的过程中一定要避免以下误区01不同类型的消毒剂混合使用两种及以上消毒剂混合使用，极易产生化学反应，可能造成伤害。洗衣液同样不青海印发碳达峰实施方案大河财立方消息12月19日，青海省人民政府发布关于印发青海省碳达峰实施方案的通知。方案明确，十四五期间，产业结构和能源结构调整优化取得明显进展，重点行业能源利用效率大幅提升，清洁低从梅西队友到让梅西圆梦！爱哭鬼斯卡洛尼将阿根廷带回巅峰所有的人都在谈论梅西，但鲜有人谈论这支球队里那个连西装都不穿，永远穿着运动服站在世界上最重要比赛场边的主教练利昂内尔斯卡洛尼。12月13日，卡塔尔世界杯半决赛，阿根廷胜克罗地亚，梅孕306终究是发烧了冬月二十四，1552刚又测了一下体温，38。于昨晚嗓子发干，今天眼泪，鼻塞，清涕都来了。是福不是祸，是祸躲不过。一遍遍告诉自己要勇敢，要坚强。胎宝这半天多几乎没动，担心是无可避免的公主和魔法65偶遇冰雪美人1儿童睡前故事童话儿童爱听讲故事原创童话公主和魔法全集点击链接公主和魔法65偶遇冰雪美人1（一起去滑雪1）奇琪公主跟着贝王后回国后，贝王后只许她呆在王宫，这可把奇琪公主闷坏了。这时表哥杰凯王子打来电什么是给孩子最好的爱什么是给孩子最好的爱孩子出生时是一张白纸。父母往上画什么，孩子一生就是什么。孩子的命运是父母创造的。母亲如水，给孩子以温暖和安慰父亲如山，给孩子以教育和引导。家庭是一切幸福和力量的家有小阳人如何不被感染一家亲历者的经验分享随着国家防控政策全面打开，疫情反扑势头来势凶猛，被感染隔离将成为近一个时期的生活常态。如果居家有隔离人员，家庭成员如何防护呢？请看一个三口之家分享的居家防护经验石榴皮不要扔，晒干对付牛皮癣！用法快速简单吃完石榴剩下的石榴皮千万不要扔，巧用可以帮助身上的牛皮癣恢复。银屑病俗称牛皮癣，是一种慢性炎症性皮肤病，病程较长，易复发。以红斑，鳞屑为主要症状表现，全身均可发病，头皮和四肢伸侧较