AI ,Achievements and Strengths
ベリサーブナビゲーション 2026年3月号
「QA4AIDD:AI駆動開発の品質保証フレームワーク」参考文献
- [1]Google Cloud. 2025 DORA State of AI-assisted Software Development Report. https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report. (Accessed on October 23, 2025).
- [2]H. Nasir. (April 30, 2025). Microsoft's CEO reveals that AI writes up to 30% of its code — some projects may have all of its code written by AI. https://www.tomshardware.com/tech-industry/artificial-intelligence/microsofts-ceo-reveals-that-ai-writes-up-to-30-percent-of-its-code-some-projects-may-have-all-of-its-code-written-by-ai. (Accessed on October 23, 2025).
- [3]H. Langley. (June 12, 2025). Sundar Pichai says AI is making Google engineers 10% more productive. Here's how it measures that. https://www.businessinsider.com/ai-google-engineers-coding-productive-sundar-pichai-alphabet-2025-6. (Accessed on October 23, 2025).
- [4]K. Robson. (May 29, 2025). AIが人間の仕事を担う時代──Anthropic初の開発者会議でCEOが語ったこと. https://wired.jp/article/anthropic-first-developer-conference/. (Accessed on October 23, 2025).
- [5]I. O’Sullivan. (September 22, 2025). These Companies Have Already Replaced Workers with AI in 2025. https://tech.co/news/companies-replace-workers-with-ai#:~:text=,to%20follow%20up%20every%20call. (Accessed on October 23, 2025)
- [6]K. Lazar. (September 18, 2025). College graduates struggle to navigate job market with the rise of AI. https://www.cbsnews.com/losangeles/news/college-graduates-entry-level-jobs-market-ai/#:~:text=Only%2030,level%20hiring. (Accessed on October 23, 2025)
- [7]A. Bantok. (May 20, 2025). The SignalFire State of Tech Talent Report – 2025. https://www.signalfire.com/blog/signalfire-state-of-talent-report-2025#:~:text=Everyone%20took%20a%20hit%20in,getting%20deeper%20for%20new%20grads. The SignalFire State of Tech Talent Report - 2025. (Accessed on October 23, 2025)
- [8]帝国データバンク. (August 1, 2024). TDB Business View:生成 AI の活用状況調査. https://www.tdb.co.jp/report/economic/2rwpbngj_lop/. (Accessed on October 23, 2025)
- [9]OpenAI Platform. Models. https://platform.openai.com/docs/models. (Accessed on October 23, 2025)
- [10]Yan, K., Guo, H., Shi, X., Xu, J., Gu, Y., & Li, Z. (2025). Codeif: Benchmarking the instruction-following capabilities of large language models for code generation. arXiv preprint arXiv:2502.19166.
- [11]Qi, Y., Peng, H., Wang, X., Xin, A., Liu, Y., Xu, B., ... & Li, J. (2025). Agentif: Benchmarking instruction following of large language models in agentic scenarios. arXiv preprint arXiv:2505.16944.
- [12]Wen, B., Ke, P., Gu, X., Wu, L., Huang, H., Zhou, J., ... & Huang, M. (2024). Benchmarking complex instruction-following with multiple constraints composition. Advances in Neural Information Processing Systems, 37, 137610-137645.
- [13]Jiang, Y., Wang, Y., Zeng, X., Zhong, W., Li, L., Mi, F., ... & Wang, W. (2023). Followbench: A multi-level fine-grained constraints following benchmark for large language models. arXiv preprint arXiv:2310.20410.
- [14]Duan, G., Liu, M., Wang, Y., Wang, C., Peng, X., & Zheng, Z. (2025). A Hierarchical and Evolvable Benchmark for Fine-Grained Code Instruction Following with Multi-Turn Feedback. arXiv preprint arXiv:2507.00699.
- [15]Mancoridis, M., Weeks, B., Vafa, K., & Mullainathan, S. (2025). Potemkin Understanding in Large Language Models. arXiv preprint arXiv:2506.21521.
- [16]METR. (June 5, 2025). Recent Frontier Models Are Reward Hacking. https://metr.org/blog/2025-06-05-recent-reward-hacking/. (Accessed on October 23, 2025)
- [17]D. Kokotajlo. (June 10, 2025). METR's Observations of Reward Hacking in Recent Frontier Models. https://www.lesswrong.com/posts/Zu4ai9GFpwezyfB2K/metr-s-observations-of-reward-hacking-in-recent-frontier. (Accessed on October 23, 2025)
- [18]Lu, Y., Cheng, J., Zhang, Z., Cui, S., Wang, C., Gu, X., ... & Huang, M. (2025). LongSafety: Evaluating Long-Context Safety of Large Language Models. arXiv preprint arXiv:2502.16971.
- [19]Goetz, S., & Schaad, A. (2024). " You still have to study"--On the Security of LLM generated code. arXiv preprint arXiv:2408.07106.
- [20]Ouédraogo, W. C., Kaboré, K., Li, Y., Tian, H., Koyuncu, A., Klein, J., ... & Bissyandé, T. F. (2024). Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation. arXiv preprint arXiv:2407.00225.
- [21]Tihanyi, N., Bisztray, T., Ferrag, M. A., Jain, R., & Cordeiro, L. C. (2025). How secure is AI-generated code: A large-scale comparison of large language models. Empirical Software Engineering, 30(2), 47.
- [22]Liu, Y., Le-Cong, T., Widyasari, R., Tantithamthavorn, C., Li, L., Le, X. B. D., & Lo, D. (2024). Refining chatgpt-generated code: Characterizing and mitigating code quality issues. ACM Transactions on Software Engineering and Methodology, 33(5), 1-26.
- [23]Huynh, N., & Lin, B. (2025). Large Language Models for Code Generation: A Comprehensive Survey of Challenges, Techniques, Evaluation, and Applications. arXiv preprint arXiv:2503.01245.
- [24]Karanikiotis, T., & Symeonidis, A. L. (2025). A Data‐Driven Methodology for Quality Aware Code Fixing. IET Software, 2025(1), 4147669.
- [25]Kharma, M., Choi, S., AlKhanafseh, M., & Mohaisen, D. (2025). Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis. arXiv preprint arXiv:2502.01853.
- [26]Sonar. (October 2025). The Coding Personalities of Leading LLMs – A State of Code Report. https://www.sonarsource.com/the-coding-personalities-of-leading-llms.pdf. (Accessed on October 23, 2025)
- [27]Google Cloud. Accelerate State of DevOps. https://dora.dev/research/2024/dora-report/2024-dora-accelerate-state-of-devops-report.pdf. (Accessed on October 23, 2025).
- [28]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
- [29]GitHub Blog. How to create issues and pull requests in record time on GitHub. https://github.blog/developer-skills/github/how-to-create-issues-and-pull-requests-in-record-time-on-github/. (Accessed on October 23, 2025)