Zhipu AI, a Chinese provider of AI models, has announced the free release of “GLM-4-Flash,” a large language model (LLM) for simple vertical tasks that require quick responses and low costs.
The model supports multi-turn dialogue, web browsing, function calls, and long-text reasoning with a maximum context of 128K. Additionally, it can generate text at a speed of 72.14 tokens per second (~115 characters per second) and supports 26 languages including Chinese, English, Japanese, Korean, and German.
The company claims that GLM-4-Flash offers improved efficiency with greater concurrency and throughput while lowering inference costs through adaptive weight quantization, various parallelization methods, batching strategies, and speculative sampling at the inference level.
By using this site, you agree to allow SPEEDA Edge and our partners to use cookies for analytics and personalization. Visit our privacy policy for more information about our data collection practices.