• 11 Posts
  • 429 Comments
Joined 3 years ago
cake
Cake day: July 6th, 2023

help-circle


  • --no-default-features --features=foo,bar is fine. The harmful part is replacing established crates with smaller ones solely because of ze size.

    And the whole dance doesn’t even do what many people think it does, as covered in some comments in that linked old thread.

    Note that I made that jerk thread when min-dependency-ing was sort of trending. A trend that was often (not always) as stupid and counter-productive as the other related(-ish) trend min-binary-sizing.

    Also note that the harmfulness of that trend went beyond bad ecosystem dynamics and dependants ending up with less quality/reliability. That semi-obsession also encouraged bad coding practices like pervasive use of dyn where it’s not needed. Compromising idiomaticity, and removing zero-cost abstractions to win a faster compiling dependency prize!

    I will stop here, because I really don’t want to write that blog post.


  • Nah. There was space for simple (in a good way) native (no VM) GC languages to set between scripted (python) and VM(-first) languages (java) on one side, and no-GC languages on the other. Not doing OOP and not doing exceptions were good decisions as a starting point. But the problem is that this was based on following C (instead of C++) and nothing else, making no use of decades of programming language research and development*. And it’s the (un)design that followed that ended up creating a horrible simple (in a bad way) language. And this google-branded language hogged that space where other better languages could have been developed, or those which got developed could have flourished more and gained more momentum. And the corporate marketing actually tried to sell every bad design aspect as a “akshually a feature”. For example, lack of generics was celebrated for years as great simplicity, until an almost deliberately bad implementation of generics got added later as you mentioned.

    tl;dr: The surface premise of the language was good and arguably needed at the time. What got delivered was bad.


    * An observant historian would point out here that some good (arguably better even) languages predate C itself (e.g. the ML family).


  • Go is not even good. It’s horribly designed (or rather, un-designed, since its creators actually boasted about deliberately doing the core part in what? a couple of weeks IIRC). If it wasn’t for the associated corporate brand, it would have been a dead meme in the eyes of everyone by 2015 when Rust hit v1.0 (It was born a meme in the eyes of those who know).

    And I mentioned that date to point out that we can’t call these languages new forever 😉 . In fact, if you took a snapshot of street tech talk from 10 years ago, you would see that these generic conventional unwisdom comparisons have been done to death already. Which then begs the question: what newfound “wisdom” needed to be added to these “muh best tool for the job” talking points? Or are we just testing the wisdom of our new best tool for all jobs, LLMs?



    • Use zram so swapping doesn’t immediately slow things to a crawl.
    • Use cargo check, often. You don’t need to always compile.
    • Add a release-dev profile that inherits release, use cranelift for codegen in it, and turn off lto.

    Otherwise, it would be useful to know what kind of system you’re running, and how is the system load without any rust dev involvement. It would also be helpful to provide specifics. Your descriptions are very generic and could be entirely constructed from rust memes.




  • One can use custom viewers via core.pager and interactive.diffFilter in git configuration, not to mention defining custom difftools directly.

    I primarily use delta for this (sometimes packaged as git-delta), which itself is implemented in Rust too.

    For example, save this as a script called delta-s somewhere in $PATH:

    #!/bin/bash
    
    delta -s \
      --minus-style='syntax #400000' \
      --plus-style='syntax #004000' \
      --minus-emph-style='normal #a00000' \
      --plus-emph-style='normal #00a000' \
      --line-buffer-size=48 \
      --max-line-distance=0.8 $@
    

    Then, in ~/.gitconfig, add

    [difftool "d-sbs"]
      cmd = diff -u $LOCAL $REMOTE | delta-s
    

    And the you can just

    git difftool --tool d-sbs HEAD~
    

    You can further create an alias for that too of course.


  • Didn’t read the whole thing because I had to stop at the right column at the start.

    Federated is “decentralized”. The correct word the author is looking for is “distributed”. And even then, direct exclusive P2P is only one form of “distributed”. Hybrid/Multiple approaches are also wide-spread (torrents anyone!).

    Not sure how a technical writer gets such a basic aspect wrong.

    Also, beyond the closed source aspect, and being a closed up platform in general, Discord was always literal spyware. And pretending like open-source projects who chose to use it didn’t know what they were doing, and glossing over the actions that ranged from collective nagging to almost literal fights in some communities because of such choices, reeks of willful blindness.


  • It’s laughable before you even get to the code. You know, doing “eval bad” when all the build scripts are written in bash 🤣

    There is also no protection for VCS sources (assuming no revision hash is specified) in makepkg (no “locking” with content hash stored). So, if an AUR package maintainer is malicious, they can push whatever they want from the source side. They actually can do that in any case obviously. But with VCS sources, they can do it at any moment transparently. In other words, your primary concern should be knowing the sources are coming from a trustable upstream (and hoping no xz-like fiasco is taking place). Checking if the PLGBUILD/install files are not fishy is the easier part (and should be done by a human). And if you’re using AUR packages to the extent where this is somehow a daunting time-consuming task, then there is something wrong with you in any case.

    Edit: That is not to say the author of the tool wouldn’t just fit right in with security theater crowd. Hell, some of them even created whole businesses using not too dissimilar theater components.

    @kadu@scribe.disroot.org


  • So try_from(&**p) is not a code smell/poor form in Rust?

    No. It’s how you (explicitly) go from ref to deref.

    Here:

    • p is &PathBuf
    • *p is PathBuf
    • **p is Path (Deref)
    • And &**p is &Path.

    Since what you started with is a reference to a non-Copy value, you can’t do anything that would use/move *p or **p. Furthermore, Path is an unsized type (just like str and [T]), so you need to reference it (or Box it) in any case.

    Another way to do this is:

    let p: &Path = p.as_ref();
    

    Some APIs use AsRef in signatures to allow passing references of different types directly (e.g. File::open()), but that doesn’t apply here.



  • Let’s do this incrementally, shall we?

    First, let’s make get_files_in_dir() idiomatic. We will get back to errors later.

    fn get_files_in_dir(dir: &str) -> Option<Vec<PathBuf>> {
        fs::read_dir(dir)
            .ok()?
            .map(|res| res.map(|e| e.path()))
            .collect::<Result<Vec<_>, _>>()
            .ok()
    }
    

    Now, in read_parquet_dir(), if the unwraps stem from confidence that we will never get errors, then we can confidently ignore them (we will get back to the errors later).

    fn read_parquet_dir(entries: &Vec<String>) ->  impl Iterator<Item = record::Row> {
        // ignore all errors
        entries.iter()
            .cloned()
            .filter_map(|p| SerializedFileReader::try_from(p).ok())
            .flat_map(|r| r.into_iter())
            .filter_map(|r| r.ok())
    }
    

    Now, let’s go back to get_files_in_dir(), and not ignore errors.

    fn get_files_in_dir(dir: &str) -> Result<Vec<PathBuf>, io::Error>
    {
        fs::read_dir(dir)?
            .map(|res| res.map(|e| e.path()))
            .collect::<Result<Vec<_>, _>>()
    }
    
     
     fn main() -> Result<(), io::Error> {
         let args = Args::parse();
    -    let entries = match get_files_in_dir(&args.dir)
    -    {
    -        Some(entries) => entries,
    -        None => return Ok(())
    -    };
    -
    +    let entries = get_files_in_dir(&args.dir)?;
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
         for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
    
    

    Now, SerializedFileReader::try_from() is implemented for &Path, and PathBuf derefs to &Path. So your dance of converting to display then to string (which is lossy btw) is not needed.

    While we’re at it, let’s use a slice instead of &Vec<_> in the signature (clippy would tell you about this if you have it set up with rust-analyzer).

    
    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = record::Row> {
        // ignore all errors
        entries.iter()
            .filter_map(|p| SerializedFileReader::try_from(&**p).ok())
            .flat_map(|r| r.into_iter())
            .filter_map(|r| r.ok())
    }
    
         let entries = get_files_in_dir(&args.dir)?;
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
    -    for (idx, row) in read_parquet_dir(&entries.iter().map(|p| p.display().to_string()).collect()).enumerate() {
    +    for (idx, row) in read_parquet_dir(&entries).enumerate() {
             let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
             if idx == 0 {
                 wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
    
    
    

    Now let’s see what we can do about not ignoring errors in read_parquet_dir().


    Approach 1: Save intermediate reader results

    This consumes all readers before getting further. So, it’s a behavioral change. The signature may also scare some people 😉

    fn read_parquet_dir(entries: &Vec<PathBuf>) ->  Result<impl Iterator<Item = Result<record::Row, ParquetError>>, ParquetError> {
        Ok(entries
            .iter()
            .map(|p| SerializedFileReader::try_from(&**p))
            .collect::<Result<Vec<_>, _>>()?
            .into_iter()
            .flat_map(|r| r.into_iter()))
    }
    

    Approach 2: Wrapper iterator type

    How can we combine errors from readers with flat record results?

    This is how.

    enum ErrorOrRows {
        Error(Option<ParquetError>),
        Rows(record::reader::RowIter<'static>)
    }
    
    impl Iterator for ErrorOrRows {
        type Item = Result<record::Row, ParquetError>;
    
        fn next(&mut self) -> Option<Self::Item> {
            match self {
                Self::Error(e_opt) => e_opt.take().map(Err),
                Self::Rows(row_iter) => row_iter.next(),
            }
        }
    }
    
    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>>
    {
        entries
            .iter()
            .flat_map(|p| match  SerializedFileReader::try_from(&**p) {
                Err(e) => ErrorOrRows::Error(Some(e)),
                Ok(sr) => ErrorOrRows::Rows(sr.into_iter()),
            })
    }
    
     
         let mut wtr = WriterBuilder::new().from_writer(io::stdout());
         for (idx, row) in read_parquet_dir(&entries).enumerate() {
    +        let row = row?;
             let values: Vec<String> = row.get_column_iter().map(|(_column, value)| value.to_string()).collect();
             if idx == 0 {
                 wtr.serialize(row.get_column_iter().map(|(column, _value)| column.to_string()).collect::<Vec<String>>())?;
    

    Approach 3 (bonus): Using unstable #![feature(gen_blocks)]

    fn read_parquet_dir(entries: &[PathBuf]) ->  impl Iterator<Item = Result<record::Row, ParquetError>> {
        gen move {
            for p in entries {
                match SerializedFileReader::try_from(&**p) {
                    Err(e) => yield Err(e),
                    Ok(sr) => for row_res in sr { yield row_res; }
                }
            }
        }
    }
    


  • As with all ads, especially M$ ones…
    No Code, Don’t Care

    At least if the code was available, I would find out what they mean by “spoofed Mime” and how that attack vector works (Is the actual file “magic” header spoofed, but the file still manages to get parsed with its non-“spoofed” actual format none the less?!, How?).

    Also, I would have figured out if this is a new use of “at scale” applied to purely client code, or if a service is actually involved.



  • If I understand what you’re asking…

    This leaves out some details/specifics out to simplify. But basically:

    async fn foo() {}
    
    // ^ this roughly desugars to
    
    fn foo() -> impl Future<()> {}
    

    This meant that you couldn’t just have (stable) async methods in traits, not because of async itself, but because you couldn’t use impl Trait in return positions in trait methods, in general.

    Box<dyn Future> was an unideal workaround (not zero-cost, and other dyn drawbacks). async_trait was a proc macro solution that generated code with that workaround. so Box<dyn Future> was never a desugaring done by the language/compiler.

    now that we have (stable) impl Trait in return positions in trait methods, all this dance is not strictly needed anymore, and hasn’t been needed for a while.