Last updated: · Indexed by AgentSkillsHub · Auto-synced every 8h
如何用 Python 和 BERT 做中文文本二元分类? 兴奋 去年, Google 的 BERT 模型一发布出来,我就很兴奋。 因为我当时正在用 fast.ai 的 ULMfit 做自然语言分类任务(还专门写了《如何用 Python 和深度迁移学习做文本分类?》一文分享给你)。ULMfit 和 BERT 都属于预训练语言模型(Pre-trained Language Modeling),具有很多的相似性。 所谓语言模型,就是利用深度神经网络结构,在海量语言文本上训练,以抓住一种语言的通用特征。 上述工作,往往只有大机构才能完成。因为花费实在太大了。 这花费包括但不限于: 存数据 买(甚至开发)运算设备 训练模型(以天甚至月计) 聘用专业人员 …… 预训练就是指他们训练好之后,把这种结果开放出来。我们普通人或者小型机构,也可以借用其结果,在自己的专门领域文本数据上进行微调,以便让模型对于这个专门领域的文本有非常清晰的认识。 所谓认识,主要是指你遮挡上某些词汇,模型可以较准确地猜出来你藏住了什么。 甚至,你把两句话放在一起,模型可以判断它俩是不是紧密相连的上下文关系。 这种“认识”有用吗? 当然有。 BERT 在多项自然语言任务上测试,不少结果已经超越了人类选手。 BERT 可以辅助解决的任务,当然也包括文本分类(classification),例如情感分类等。这也是我目前研究的问题。 痛点 然而,为了能用上 BERT ,我等了很久。 Google 官方代码早已开放。就连 Pytorch 上的实现,也已经迭代了多少个轮次了。 但是我只要一打开他们提供的样例,就头晕。 单单是那代码的行数,就非常吓人。 而且,一堆的数据处理流程(Data Processor) ,都用数据集名称命名。我的数据不属于上述任何一个,那么我该用哪个? 还有莫名其妙的无数旗标(flags) ,看了也让人头疼不已。 让我们来对比一下,同样是做
| Stars | 368 |
| Forks | 111 |
| Language | Jupyter Notebook |
| Category | AI Tool |
| Quality Score | 52.9965059207684/100 |
| Open Issues | 18 |
| Last Updated | 2021-07-13 |
| Created | 2019-04-07 |
| Est. Tokens | ~18k |
Looking for a demo-chinese-text-binary-classification-with-bert alternative? If you're comparing demo-chinese-text-binary-classification-with-bert with other ai tool tools, these 6 projects are the closest alternatives on Agent Skills Hub — ranked by topic overlap, star count, and community traction.
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, ma
The fastest way to build and start training your own LLM. CLI tool that scaffolds production-ready PyTorch tra
A ReAct-Based Highly Robust Autonomous Agent (Harness) Framework.
AI-powered universal search for all your personal data, tailored just for you. Goal:The world's first product
Explore other popular ai tool tools:
demo-chinese-text-binary-classification-with-bert is an open-source ai tool by wshuyi with 368 GitHub stars.
demo-chinese-text-binary-classification-with-bert is primarily written in Jupyter Notebook. It covers topics such as bert, classification, nlp.
You can find installation instructions and usage details in the demo-chinese-text-binary-classification-with-bert GitHub repository at github.com/wshuyi/demo-chinese-text-binary-classification-with-bert. The project has 368 stars and 111 forks, indicating an active community.
The top alternatives to demo-chinese-text-binary-classification-with-bert on Agent Skills Hub include LLMCompiler, LLM-Finetuning-Toolkit, Awesome-LLM-Eval. Each offers a different approach to the same problem space — compare them side-by-side by stars, quality score, and community activity.