Alt account of @Badabinski

Just a sweaty nerd interested in software, home automation, emotional issues, and polite discourse about all of the above.

  • 0 Posts
  • 11 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2024

help-circle
  • The segmented caching request thing is… weird. I worked for a company that developed a caching proxy and it very much did not work that way. Like, random access in a caching system is usually kinda bad and you should try to avoid it. Like, our proxy manually controlled the disk (it wasn’t a mounted filesystem) so it could constantly sweep the head across the disk and cue up reads and writes optimally. This gets much harder when things are fragmented as fuck.

    If the concern was about what would happen with multiple connections for the same cache miss, then the caching proxy should just combine the client-side connections into a single upstream one. You can still cache the first part of the response if your upstream connection gets terminated and then restart it from that point.





  • Yeah, it’s so fucking frustrating. I felt conflicted writing this because I don’t want to reduce anyone’s resistance to the garbage being pushed by the big corpos. We should be saying “no” as strongly as possible at every encroachment. I just also don’t want essential research to also take the blow. A lot of environmental research benefits from satellite imagery, and anything we can do to glean more information from that is a good thing.

    Damned if you do, damned if you don’t. You can’t really expect the average person to learn the distinction between the good and the bad here. You can try to educate folks, but people already have enough shit on their plates as it is.


  • This is why “AI” is such a shit term. This is not a general purpose generative model, which is what you (and me) should (and very clearly do) dislike.

    This is a model that is designed to operate on a very specific set of data and extract information from it. It was created by people at the University of Cambridge, not one of the big shitty companies. It’s not something that you run all the time, it’s something you only run when gathering data for research purposes. The model was trained on truly freely available data. No nonconsensual large-scale scraping was used to train this model, so it’s free of the ethical concerns typically associated with “AI”. Since it’s something a research group would run by themselves on a single (albeit very powerful) machine, it has very modest power requirements.

    Models like this have been around for at least 15 years in the research space, and they don’t deserve your ire. It’s one of the truly good uses of ML.

    If you want more details on the system, it’s all open source and can be found here: https://github.com/ucam-eo/tessera

    EDIT: Please don’t take this as me trying to defend LLMs and image generators. I fucking hate LLMs and image generators. People at my workplace have described me as “the anti-AI guy” because I really am. I think almost all of the ML products made by OpenAI, Anthropic, and others are unethical and also just shit.