🔗 LINKING SCIENTIFIC TEXT TO THE EXACT CODE THAT IMPLEMENTS IT A paper from today's arXiv new submissions - from the University of Arizona and Lex Machina - introduces a new task: bidirectional small-granularity search between code and text. The idea is to directly link specific sentences in scientific publications to the precise code segments that implement the described method, and vice versa. Not at the function or file level, but at the snippet level - matching a 2-3 sentence description to the 5-10 lines of code it corresponds to. The motivation is practical. Reading a machine learning paper and finding the corresponding implementation in a public repository typically requires either already knowing the codebase or spending significant time searching. The reverse problem - reading someone else's code and finding the paper section that explains what it's doing - is, if anything, harder. A system that can bridge those two artefacts at fine granularity could significantly reduce the friction of reproducing and building on published research. The dataset the authors introduce includes auto-generated GPT-4 descriptions for training and manually annotated out-of-domain test sets. The gap between in-domain and OOD performance will tell the field how well current retrieval approaches actually generalise here, as opposed to memorising surface patterns. It's a small paper, but it defines a task that's likely to matter a great deal as AI-assisted research becomes more common. Link to Asteris on my profile. Sources: - https://lnkd.in/eghkvptC #LLMResearch #CodeAI #ScientificResearch #NLP #AIResearch
Asteris AI
Marketing Services
Accessible, affordable AI for SMEs. Turn photos into on-brand posts fast, beat creative block, stay consistent with ease
About us
Our Mission ------------ We’re AI natives on a mission to make AI accessible and affordable for small and medium businesses. We’re passionate about AI’s potential to boost productivity, without losing the human touch. Asteris helps retail teams maintain a consistent social media presence with minimal effort and cost, while overcoming creative block. Our AI works with your original content, your media, and your brand tone to craft both strong copy and rich media posts that retain your identity. We’re firm believers in human-in-the-loop AI: you stay in control, your voice stays authentic, and we work hard to ensure the output feels human, not robotic. Learn more: • https://asteris.ai/ Follow us on: • LinkedIn: https://www.linkedin.com/company/asteris-ai/ • Instagram: https://www.instagram.com/asteris_ai/ • X: https://x.com/asteris_ai
- Website
-
https://asteris.ai/
External link for Asteris AI
- Industry
- Marketing Services
- Company size
- 2-10 employees
- Type
- Privately Held
- Founded
- 2025
- Specialties
- Marketing, AI, Software, SaaS, and Product
Updates
-
⚠️ NEW RESEARCH: FINE-TUNING LLMS ON DOMAIN DATA CHANGES HOW THEY HALLUCINATE, NOT WHETHER THEY DO A study from Georgia Tech in today's arXiv submissions examines what happens to hallucinations when you fine-tune Llama-2 on domain-specific data. The finding is more specific - and more useful - than "fine-tuning helps with some things and hurts with others." What they found is a consistent pattern: the model gets better at tasks similar to its training examples, but hallucinates more confidently on novel queries within the same domain. The failure mode is over-generation - providing correct answers with extra, fabricated detail - rather than refusing to answer. This matters for any organisation that has fine-tuned a model on its own data and declared it "domain-adapted." The benchmark performance on held-out examples from the training distribution may genuinely be better. But the model's behaviour on queries it hasn't seen before - the actual deployment scenario - may be worse than the baseline in ways that are hard to detect. A model that says "I don't know" is easy to handle. A model that says something plausible and wrong with confidence is much more dangerous. The broader implication is that fine-tuning evaluation needs to include out-of-distribution testing as a standard component, not an afterthought. If your domain-specific model only outperforms the baseline on in-distribution examples, you haven't adapted it - you've just over-fitted it to your labelling choices. Sources: - https://lnkd.in/eePyS_wS #LLMResearch #Hallucination #FineTuning #AIReliability #EnterpriseAI
-
⚙️ GRAPHLORA BRINGS GRAPH STRUCTURE INSIDE THE FINE-TUNING PROCESS FOR RECOMMENDATION A paper accepted to ACL 2026 Findings, in today's arXiv submissions, introduces GraphLoRA - a variant of LoRA (low-rank adaptation) that embeds a graph message-passing network directly inside the fine-tuning pathway for LLM-based recommendation systems. The core insight is that current approaches treat graph structure as static input: you describe the collaborative relationships in the prompt, or inject pre-trained embeddings, and the LLM has to figure out what to do with them. GraphLoRA makes the structural signals actively participate in the parameter updates themselves. The recommendation problem is a useful test case for a broader challenge: how do you get LLMs to reason about relational data? Language models are trained on text, which is fundamentally sequential. Graphs encode relationships that don't have a natural text representation - the "you bought X, users similar to you also bought Y" signal that collaborative filtering depends on is a second-order structural property, not something you can easily write in a sentence. GraphLoRA's approach of propagating graph topology through the LoRA adapter weights is a practical solution to that mismatch. For anyone building recommender systems on top of foundation models, this is worth reading. The performance improvements on standard benchmarks are solid, and the design principle - don't describe structure, propagate it - may have applications beyond recommendation. Sources: - https://lnkd.in/eSbez6tk #LLMResearch #RecommendationSystems #GraphNeuralNetworks #FineTuning #NLP
-
🗺️ ABLE BUILDS FINGERPRINTS FOR LLMs THAT WORK ACROSS DIFFERENT ARCHITECTURES A paper from today's arXiv new submissions introduces ABLE - Attribution-Based Large-model Embedding - a framework for creating model representations that work even when the models being compared have different architectures and tokenisers. The approach uses gradient-based feature attributions rather than output comparisons, capturing how each model responds to input, not just what it says. The problem this solves is real and growing. We now have hundreds of publicly available LLMs, plus an unknown number of fine-tuned derivatives of proprietary models. Comparing them, auditing their provenance, or detecting whether a suspicious model is a derivative of a known base model is increasingly important for security and compliance. Existing methods either require access to internal parameters (not available for proprietary models) or compare outputs - which fails when two different models produce similar text in response to similar prompts. ABLE's gradient-based approach captures a more fundamental model-specific signal. The application areas the paper highlights include provenance auditing, security analysis, and model selection in heterogeneous environments. With model proliferation accelerating and regulatory interest in AI model documentation growing, the ability to reliably identify and map relationships between LLMs is likely to become infrastructure rather than a research exercise. Sources: - https://lnkd.in/esR_Uz6x #LLMResearch #AIInterpretability #ModelSecurity #AI Governance #NLP
-
🔬 NEW RESEARCH FROM META AI: ARE WE OVER-CREDITING LLM POST-TRAINING? A position paper from researchers at Meta AI and the Hebrew University of Jerusalem - in today's arXiv submissions - makes a provocative argument about the current LLM training paradigm. The core claim: the massive post-training phase (SFT plus RL) that now defines state-of-the-art language models is, functionally, a reversion to the pre-LLM era of supervised fine-tuning. It's distribution fitting, not genuine capability acquisition. To test this, they trained models from randomly initialised weights - no pre-training at all - on modern reasoning datasets, and evaluated them on competitive maths and code benchmarks. The results were "highly non-trivial." A randomly initialised model, post-trained on the right data distribution, achieves non-trivial performance on benchmarks we treat as measures of deep capability. The interpretation the authors draw: if you can get there from random initialisation, the benchmark gains from post-training may be telling us more about the training distribution than about the model's actual reasoning ability. This is a genuinely uncomfortable finding for anyone who has been citing benchmark improvements as evidence that models are getting meaningfully smarter. It doesn't mean the improvements aren't real - it means we need better tests of whether they generalise beyond the distribution they were trained on. The paper ends with a call to move "beyond extensive post-training" to develop genuinely capable systems, which is easier to say than to specify. Sources: - https://lnkd.in/eYCw-HbF What's your reaction to the idea that SFT + RL might be more about fitting benchmarks than gaining real capability? #LLMResearch #MachineLearning #AIResearch #RLHF #FrontierAI
-
📊 TINYJUDGE SOLVES A REAL PROBLEM IN TRAINING AI TO FOLLOW INSTRUCTIONS A paper from today's arXiv listings - accepted to ACL 2026 main conference - introduces TinyJudge, a framework for evaluating the kinds of instruction-following constraints that current training methods struggle with. The problem it targets is specific: reinforcement learning with verifiable rewards works well for constraints you can automatically check (output length, format, presence of specific words) but breaks down for "soft" constraints like tone, formality, or style. Existing approaches use large frontier models as judges, which is expensive and slow. TinyJudge's solution is to distil expertise from frontier models into an ensemble of 0.6 billion parameter specialist models - tiny by today's standards - each trained to evaluate a specific type of soft constraint. The result is high-precision, low-compute evaluation that can be used as a reward signal during RL training. The paper reports significant improvements over using a frontier LLM as a single judge, and addresses the reward hacking problem that plagues existing approaches. For anyone working on fine-tuning or instruction-tuning LLMs in production, this is directly relevant. The cost and latency of using GPT-4 or similar as a judge at training scale is a real constraint. Specialist small models that can do the same job with a fraction of the compute - and without the reward hacking risk - is a practically useful development, not just an academic one. For more AI insights, follow Asteris. Sources: - https://lnkd.in/eytmNMxz #LLMResearch #RLHF #InstructionFollowing #FinetuningLLMs #AITraining
-
🧠 BEACON DETECTS LLM HALLUCINATIONS WITHOUT NEEDING TO SEE INSIDE THE MODEL A paper from today's arXiv new submissions introduces BEACON - a hallucination detection framework that works entirely from model outputs, with no access to internal parameters or external knowledge bases. The approach builds a 31-dimensional feature vector from structured multi-pass generation, combining semantic entropy, embedding geometry, chain-of-thought consistency, and paraphrase stability signals. A gradient-boosted classifier trained on 7,617 labelled examples across seven benchmarks achieves 0.81 AUROC. The practical significance here is the "black-box" constraint. Most hallucination detection methods that perform well require white-box access - meaning they need the model's internal probability distributions or activations. That's fine for research, but it's incompatible with how most production AI systems are actually deployed, where you're calling a third-party API and getting text back. BEACON achieves comparable performance using only what you can observe from the outside. The paper also includes an efficient 5-call variant achieving 0.78 AUROC - a concession to production latency constraints that most research papers don't bother making. The feature importance analysis finding is the one that deserves attention: "hallucination is inherently multi-dimensional." No single signal - not entropy alone, not consistency alone - is sufficient. You need the combination. That has implications for anyone who's been trying to solve the hallucination problem with a simpler, single-signal approach. Sources: - https://lnkd.in/efjM-3rB #LLMResearch #Hallucination #AIReliability #NLP #MachineLearning
-
🏗️ AGENTIC AI IN CONSTRUCTION PAYMENTS: EARLYTRADE'S RAISE POINTS TO AN OVERLOOKED VERTICAL Earlytrade announced yesterday it has raised new funding, bringing its total to $25 million, with S3 Ventures and Brick & Mortar Ventures leading the round. The company operates a subcontractor payments marketplace for the construction industry - letting general contractors trade working capital with subcontractors - and plans to deploy agentic AI into the core of that infrastructure. Since launching in the US in 2024, it's achieved 7x revenue growth and facilitated over $3 billion in early payments globally. Construction is exactly the kind of industry that AI funding coverage tends to skip over. It's not glamorous, the technology adoption curve is slow, and the businesses involved are often resistant to change. But it's also an industry where $2 trillion in annual US output moves through payment processes that are still fundamentally paper-based, with subcontractors routinely waiting 60 to 90 days for money they're already owed. The working capital problem isn't trivial - cash flow constraints cause real harm to small trade contractors, and the manual overhead of managing thousands of subcontractor relationships simultaneously is a genuine operational burden. Agentic AI that can predict cash-flow needs, autonomously route capital, and optimise payment timing across thousands of relationships simultaneously isn't science fiction for this use case. It's a direct product requirement. Earlytrade's bet is that whoever builds that infrastructure layer at scale owns an incredibly sticky position in a sector that's finally starting to modernise. Hit follow if you find posts like this useful. Does your industry have an equally overlooked payment or workflow problem that AI could realistically solve? #AgenticAI #ConstructionTech #FinTech #AIStartup #WorkingCapital
-
⚔️ ALTA ARES RAISED €50M TO MAKE AI-GUIDED DRONE INTERCEPTION CHEAPER THAN THE DRONES IT STOPS The French defence startup announced its Series A yesterday, led by Air Street Capital with Cherry Ventures, OTB Ventures, and Harpoon Ventures joining. Founded in 2024 with just €2 million in seed funding, Alta Ares has built AI-guided interceptors already deployed in Ukraine, the Middle East, and Asia. The round is a validation of both the technology and the fundamental economics problem it's trying to solve. That economics problem is this: a Shahed-type attack drone costs somewhere between €20,000 and €50,000 to manufacture. The missiles historically used to destroy one can cost over a million euros each. NATO allies are now facing coordinated salvos of 600-plus drones and dozens of missiles in single-night attacks. The old air-defence model - one expensive missile per cheap drone - breaks down in that scenario both financially and in terms of ammunition stockpiles. Alta Ares builds interceptors designed to cost a fraction of the threats they eliminate, with AI handling the targeting and guidance rather than expensive seeker heads and precision engineering. The company's X-Wing interceptor has been combat-tested, and it's fielded contracts across multiple active conflict zones. The new funding goes toward scaling production and expanding into Poland, Germany, and the US. This is one of those cases where the practical application of AI is happening faster on the battlefield than in almost any other domain. What's your read on the role AI-guided autonomous weapons will play in reshaping defence procurement over the next decade? #DefenceTech #AIDefence #AltaAres #FranceTech #NATODefence
-
🤖 AI VISION MODELS CAN PREDICT DANGER - BUT THEY CAN'T READ YOUR FACE Cornell researchers published a study yesterday testing whether AI vision language models could predict how tense situations would end - based on video, and based on facial expressions alone. The results are genuinely interesting. The best models outperformed the average human at predicting outcomes from full video. But when shown only the facial expressions of people watching those same scenarios, the models failed badly. Humans, it turns out, extract enormous predictive signal from subtle facial cues - a narrowing of the eyes, a slight jaw tension - that current AI systems essentially miss. The practical implications are significant for anyone building robots or AI systems that need to operate in shared human spaces. A warehouse robot that can predict whether a situation is about to go wrong from a video feed is useful. But a robot that can't tell from looking at your face that you're about to drop something, or that you're frustrated, or that you've noticed something it hasn't - that's a robot that will keep being awkward and potentially dangerous in ways that are hard to anticipate from benchmarks alone. The researchers at Cornell frame this as a "serious deficit in anticipatory social intelligence." I'd frame it as a reminder that the most important AI capabilities for human-robot interaction aren't the ones that show up in reasoning benchmarks. They're the ones that let a robot read a room. What would it take, in your view, to close the facial expression gap? #AIResearch #Robotics #HumanRobotInteraction #ComputerVision #PhysicalAI