• 0 Posts
  • 95 Comments
Joined 3 years ago
cake
Cake day: June 20th, 2023

help-circle

  • You can’t just write off capital expenditure though. The hardware, even for “effecient” MOE inference is still very expensive to buy, house, run, and cool. Even assuming open-weight model serving at $0 r&d for the models themselves, mixing high-prefill workloads doesn’t batch well with decode heavy concurrency (or other prefill-heavy jobs). The moment you do anything nontrivial you start running into very complicated architectural problems to efficiently solve at scale.

    Hardware that is useful for 5-10 years at most, plus development and support for the inference workflows, doesn’t leave a lot of margin on the table.

    My gut, along with basically everything I read, suggests that not most (even pure inference) shops are not profitable and are still floating on loans or vc money.




  • People shit on it but there’s a lot of good open-source tooling that supports it.

    There are nist l1 profiles

    Tutorials and guides for everything

    etc

    Part of being a good sysadmin is knowing when not to reinvent the wheel. Ubuntu has a lot of options for vetted, hardened, “other people’s wheels.”

    Also, for posterity, the competent ones are running the headless, server version of Ubuntu. (As opposed to the bloated mess that is Ubuntu Desktop). The server version catches a lot of flack it doesn’t deserve.






  • If caddy is acting as a proxy for anything, you should not need to forward that port externally. Local host firewalls allowing traffic on your local network is sufficient.

    Depending on your physical host layout you may be looking at an issue with nat reflection.

    You have not given us enough about your topology to assist in troubleshooting.


  • There are server chips like the E7-8891 v3 which lived in a weird middle ground of supporting both ddr3 and ddr4. On paper, it’s about on par with a ryzen 5 5500 and they’re about $20 on US eBay. I’ve been toying with the idea of buying an aftermarket/used server board to see if it holds up the way it appears to on paper. $20 for a CPU (could even slot 2), $80 for a board, $40 for 32gb of ddr3 in quad chanel. ~$160 for a set of core components doesn’t seem that bad in modern times, especially if you can use quad/oct channel to offset the bandwidth difference between ddr3 and ddr4.

    I think finding a cooler and a case would be the hardest part






  • All of those would be perfectly cromulent nodes for small containers. The first issue you’ll run into is the low ram. Some homelab projects would cause you to exceed 8gb, but the good news is if you’re using an external backend via NFS, you can always scale out (more nodes) or up(more compute per node,) later with minimal headache.

    If you’re going to be memory constrained, don’t waste 1-2gb on a gui, install Ubuntu/Debian/whatever headless





  • No offence: but the problem is an app forces me to trust you; a website does not. I have toghter and easier control over a web request than I do over an app, and even if an app doesn’t have these permissions today, an update or an update after a sale could trivially and silently introduce them.

    A website is obvious if the deal changes-- you put up a login wall to harvest data; I stop using the site. You put trackers and ads into the UI; I block it at the DNS level.