
What we’re about
Silicon Valley Generative AI is a dynamic community of professionals, researchers, startup founders, and enthusiasts who share a passion for generative AI technology. As part of the wider GenAI Collective network, the group provides a fertile ground for the exploration of cutting-edge research, applications, and discussions on all things related to generative AI.
Our community thrives on two main types of engagement. Firstly, in partnership with Boulder Data Science, we host bi-weekly "Paper Reading" sessions. These meetings are designed for deep-dives into the latest machine learning papers, fostering a culture of continuous learning and collaborative research. It's an excellent opportunity for anyone looking to understand the nitty-gritty scientific advancements propelling the field forward.
Secondly, we organize monthly "Talks" that offer a broader range of insights into the world of generative AI. These sessions feature presentations by an eclectic mix of speakers, from industry pioneers and esteemed researchers to emergent startup founders and subject matter experts. Unlike the paper reading sessions, which are more academically inclined, the talks are tailored to appeal to a more general audience. Topics can span the gamut from the technical intricacies of the latest generative models to their real-world applications, startup pitches, and even discussions on the legal and ethical implications of AI.
Whether you're a seasoned professional or merely curious about generative AI, Silicon Valley Generative AI provides a comprehensive platform to learn, discuss, and network.
We strive to be an inclusive community that fosters innovation, knowledge-sharing, and a collective drive to shape the future of AI responsibly. Join us to stay at the forefront of generative AI research, news, and applications.
For those eager to dive deeper into the technical aspects, you can join us on the GenAI Collective Slack, specifically the #discuss-technical channel, to keep the conversations flowing between meetups.
We are also looking for the following:
• Readers: people who are willing to read papers and speak about them.
• Speakers and presenters: who will put together educational materials and present to the group as well as answer questions.
• Industry events: if you have a generative AI event like a hackathon, lunch and learn or an information session on your product, we would be happy to include in the calendar.
Please contact Matt White here or at contact@matt-white.com
Upcoming events (4+)
See all- Reinforcement Learning: Chapter 3 Finite Markov Decision ProcessesLink visible for attendees
Last meeting we introduced the agent/environment interface, dynamics equation, and discounted return. This meeting we use those concepts as a basis for defining the value function and how policies lead to value functions in a given environment. From there we will study some examples with concrete states and values and use those to explore how to reach the optimal policy and value function.
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course - Generative AI Paper Reading Log Linear AttentionLink visible for attendees
Join us for a paper discussion on "Log-Linear Attention" presented by Evelyn
Exploring a new attention mechanism that balances efficiency and expressiveness for long-sequence modeling
Featured Paper:
"Log-Linear Attention" (Guo et al., 2024)
arXiv Paper
Discussion Topics:
Motivation & Background- Standard softmax attention in Transformers: quadratic compute, linear memory—limits scalability for long sequences
- Linear/state-space models: enable linear-time, constant-memory, but rely on a fixed-size hidden state (RNN-like), limiting context modeling
- Need for an approach that is both efficient and expressive, especially for long-context tasks
Log-Linear Attention Mechanism
- Maintains a set of hidden states that grows logarithmically with sequence length (vs. fixed-size in linear attention)
- Uses Fenwick tree–based (hierarchical) partitioning to summarize past context at multiple temporal scales
- Enables O(T log T) compute and O(log T) memory for decoding; supports parallel, matmul-rich training
- Generalizes existing linear attention models and can be applied to architectures like Mamba-2 and Gated DeltaNet
Implementation & Algorithm
- Chunkwise parallel scan algorithm for efficient training
- Hierarchical masking matrix structure (quasi-H matrix) enables low-rank, blockwise computation
- Custom Triton kernel implementation outperforms FlashAttention-2 for long sequences
Performance Benchmarks
| Model/Variant | Throughput (tokens/s) | Training Runtime (ms) | Memory Usage |
| ------------- | --------------------- | --------------------- | ------------ |
| FlashAttention-2 | Baseline | O(T²) | O(T) |
| Mamba-2 | Linear, O(T) | O(T) | O(1) |
| Log-Linear Mamba-2 | O(T log T) | O(log T) | O(log T) |
| Gated DeltaNet | Linear, O(T) | O(T) | O(1) |
| Log-Linear Gated DeltaNet | O(T log T) | O(log T) | O(log T) |- Log-linear variants consistently outperform linear counterparts on synthetic recall tasks, language modeling (perplexity), and long-context retrieval
- Improved per-position loss and recall on "Needle-In-A-Haystack" and real-world benchmarks at long sequence lengths
Implementation Challenges
- Efficient hierarchical memory management for chunked computation
- Balancing expressiveness (multi-scale context) with computational cost
- Integrating log-linear attention into diverse model architectures
Key Technical Features
- Logarithmic growth of hidden states with sequence length
- Matmul-friendly parallelization for hardware efficiency
- Less than 3% parameter increase over baseline models
- Compatible with modern accelerators (GPU/TPU) and existing linear attention frameworks
Future Directions
- Applying log-linear attention to other state-space and convolutional models
- Further optimizing hierarchical memory structures for even longer contexts
- Exploring applications in domains requiring efficient long-sequence modeling (e.g., genomics, document understanding)
---
Silicon Valley Generative AI has two meeting formats.1. Paper Reading - Every second week we meet to discuss machine learning papers. This is a collaboration between Silicon Valley Generative AI and Boulder Data Science.
2. Talks - Once a month we meet to have someone present on a topic related to generative AI. Speakers can range from industry leaders, researchers, startup founders, subject matter experts and those with an interest in a topic and would like to share. Topics vary from technical to business focused. They can be on how the latest in generative models work and how they can be used, applications and adoption of generative AI, demos of projects and startup pitches or legal and ethical topics. The talks are meant to be inclusive and for a more general audience compared to the paper readings.
If you would like to be a speaker please contact:
Matt White - Reinforcement Learning: Topic TBALink visible for attendees
Typically covers chapter content from Sutton and Barto's RL book
As usual you can find below links to the textbook, previous chapter notes, slides, and recordings of some of the previous meetings.
Useful Links:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Recordings of Previous Meetings
Short RL Tutorials
My exercise solutions and chapter notes
Kickoff Slides which contain other links
Video lectures from a similar course