SAFURAI-CSHARP: Harnessing Synthetic Data to improve Language-specific Code LLM

Leonardo Boiardi2023-10-23

The Greatest C# Code LLM for everyone! Learn more about Safurai-Csharp in this blog and the arXiv Paper.

HOW TO USE IT

Getting started with Safurai-Csharp is straightforward and accessible for developers of all skill levels. If you're looking to integrate our advanced C# language model into your workflow, it's as simple as utilizing Google Colab. It provides an efficient platform for you to run the model without the need for any complex setup or infrastructure.

To jump into the action, visit the links to our model variants and choose the one that fits your requirements:

Once you’ve selected your preferred model variant, incorporating the model into your projects is a streamlined process. Whether you're aiming to enhance your development workflow or adopt the model within your educational curriculum, Safurai-Csharp stands ready to assist.

Example code for your IPYNB file...

Installing dependencies and imports

!pip install git+https://github.com/huggingface/transformers.git@refs/pull/25740/head accelerate

from transformers import AutoTokenizerimport transformersimport torch

Model download and setup - you can use TGI for faster inference time

model = "Safurai/Safurai-Csharp-34B-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model)pipeline = transformers.pipeline(    "text-generation",    model=model,    torch_dtype=torch.float16,    device_map="auto",)

Model testing and param. tuning

system = "You are an expert c# programmer, please provide complete and useful answers."user = "Write the code for an hello world script"prompt = f"<s><<SYS>>\n{system}\n<</SYS>>\n\n{user}"

sequences = pipeline(    prompt,    do_sample=True,    top_k=10,    temperature=0.1,    top_p=0.95,    num_return_sequences=1,    eos_token_id=tokenizer.eos_token_id,    max_length=200,    add_special_tokens=False)

for seq in sequences:    print(f"Result: {seq['generated_text']}")

WHAT IS SAFURAI-CSHARP

Safurai-Csharp represents the cutting edge of AI-driven code generation, specially tailored for the C# programming language. Provided under an open-source license, our model is meticulously trained to assist developers and learners alike by offering unparalleled assistance in code generation, explanation, and debugging tasks.

At its core, Safurai-Csharp shines in understanding and responding to both code-related and natural language prompts. Whether you require a snippet of code to solve a specific problem or an in-depth description of a complex C# function, Safurai-Csharp rises to the task. Impressively, it boasts a leading score of 56.33% on the Manual MultiPL-E benchmark unique to C# among open-source models, highlighting its proficiency across a variety of coding tasks.

Click the arXiv logo to read the paper

OUR USE OF SYNTHETIC DATASETS

At Safurai, we recognize the crucial role of datasets in the training of potent and effective AI models. That's why we've particularized our expertise in Synthetic Data generation for instruction datasets – a strength we're eager to share. The utilization of synthetic datasets allows us to tailor our training processes to meet specific developmental benchmarks without compromising on quality or privacy.

Our team’s prowess in synthetic data generation has armed Safurai-Csharp with diverse and rich instructional data, enhancing the model's ability to understand and generate C# code with superior accuracy. By combining our synthetic dataset expertise with the robustness of Safurai-Csharp, we are not only showcasing our technical strength but also setting a new precedent in AI instruction and its application across industry sectors.

For more technical information about Safurai-C#, read the Short Research Paper at this link or send us an email with your questions.


See More Posts

background

SAFURAI-CSHARP: Harnessing Synthetic Data to improve Language-specific Code LLM

background

SAFURAI-001: A New Qualitative Approach for Code LLM Evaluation (with arXiv Link)

background

Safurai's Commitment to GDPR and AI Act Compliance

Show more


Safurai

Safurai is the AI Code Assistant designed to revolutionize the way you code.

[email protected]

LinkedIn
Twitter
Discord

©2022 - Green Games S.r.l. | All right reserved